Hi, On 11/18/2010 08:31 AM, Peter Czanik wrote:
Hello,
As part of the patterndb project, we plan to start a log sample collecting project. At http://czanik.blogs.balabit.com/2010/11/log-sample-collecting-project/ you can read a document, which describes it. It has three main parts:
1. background / what is it good for 2. methods 3. technical requirements
It still has some "FIXME" parts in it, but already enough to get started. Please let us know what you think about it, if you have any questions, miss any information, etc.!
First of all, it's a great initiative -- this is something a lot of people could profit from. Here are my remarks: 1) If I get it right, this is just an RFC for the initiative. When the project is started, we'd definitely need an easy-to-use interface that makes it easy to browse and/or submit log samples. Something like what http://www.pcapr.net does for network captures, though without the ads and the annoying mandatory registration stuff. We can get started by using a git repo for the samples just like for patterndb, but in the long run, it'd put the barrier much lower and thus result in more log submissions to have a nice'n'shiny website for this. In either case, we need very clear and short instructions on how to submit logs, because this blog post is a bit too long to read just for that. 2) I'm not entirely sure that it's a good idea to add explenatory comments to to logs in such an "in-band" way -- they're way to easy to mistake for real log messages. I think sample log files with single events along with a .nfo file with the necessary meta information would be much more usable. Yes, as you've written, it would make it a bit more problematic to handle them, but it'd worth the trouble IMHO. 3) The sections "All logs", "Application settings" and "Host names" got me confused. These instructions can be useful but only apply to the scenario when the submitter tries to create logs for the specifically for the project. In a final documentation it should be noted accordingly, something like "Tips for generating high-quality log samples." 4) You've left out one way of generating logs, which can also be important but, I admit, is a lot different from the mentioned two collecting modes: investigating the source code of applications. This can reveal possible log messages that are almost impossible to record in real-life scenarios or to trigger in a laboratory environment but can notify about very important events. We should think about this way of getting log messages, too. greets, Peter