Hi, most of the pattern database-related projects I've seen here started off from the log side of the whole issue: investigating the messages an application produces and trying to create patterns based on them (manually or via patternize). Of course, this makes perfect sense if you're an administrator or you're trying to analyze logs from already-written applications, but what if you're the one who writes these applications and you'd like to make them easier to analyze later on? What if you're the extremely nice guy who wants to ship an up-to-date pattern database for syslog-ng along with your application? :) Or, to be more realistic, at least a list of the most important log messages your program can emit that the user can search to find out what an actual log message means. The idea I've been toying with is defining some kind of a standardized commenting scheme for logging-related calls in source code. Using this, the code could be parsed automatically and a list of log messages could easily be generated that could be used later for documentation or even generating a pattern database. I've done multiple rounds of investigation but I only found custom, application-specific solutions for this. So on one hand, I'm asking you if you've seen any widespread commenting schemes for log messages that can fulfil this task? Or even a custom, application-specific solution that is so incredibly cool that we should all adopt it instead of reinventing the wheel? :) Until then, I'm proposing a commenting scheme I think would be usable and I'm asking for your feedback. It is based on Javadoc, which is widespread enough to be familiar for the majority of developers and has mutations available for virtually all programming languages. I'm going to show the concept through the now-standard example of the logging of the login in sshd: /** * @log * * Signals that the authentication has succeeded or failed for a user. * * @class system * @tags useracct, secevt * @url http://www.openssh.com * * @example Accepted password for bazsi from 127.0.0.1 port 48650 ssh2 * @example Failed password for bazsi from 127.0.1.1 port 44637 ssh2 * * @pattern Accepted @ESTRING:usracct.authmethod: @for @ESTRING:usracct.username: @from @ESTRING:usracct.device: @port @ESTRING:: @@ANYSTRING:usracct.service@ * @pattern Failed @ESTRING:usracct.authmethod: @for @ESTRING:usracct.username: @from @ESTRING:usracct.device: @port @ESTRING:: @@ANYSTRING:usracct.service@ * * @regexp (Accepted|Failed) (gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam|hostbased) for [^[:space:]]+ from [^[:space:]]+ port [[:digit:]]+( (ssh|ssh2))?$ */ authlog("%s %s for %s%.100s from %.200s port %d%s", authmsg, method, authctxt->valid ? "" : "invalid user ", authctxt->user, get_remote_ipaddr(), get_remote_port(), info); To formalize it a bit: - comments for the log messages should follow the standard Javadoc/Doxygen formatting, with the main indentifier token being "@log" (or \log if you prefer, all further parameters should be usable in the backslash form as well) - the allowed parameters are @class, @url, @example, @regexp, @pattern and @tags - multiple @example, @pattern and @regexp fields are allowed - it is mandatory to specify either @example, @pattern or @regexp to give the application some kind of an identifier, all other params can be omitted I have my own concerns, though: - Is it even the right thing to do to spam source code like this? Is it the proper place for the semantic description of log messages? Sure, it's the easiest place to add such things while the app is being developed and if the comment's right there in the code it's more likely that any changes will be followed up during an update of the code, but still, it seems a bit of an overkill even for me now that I look at it... But maybe it'd worth it for the most important messages. - Maybe and @id would be useful to allow for manually specifying an identifier for the log messages (we all love Oracle's error codes, don't we? :)) - What should be done if a single log() call can emit multiple different messages with different meanings that should be getting their own separate entry? The example above is the perfect candidate for this: the "login accepted" and "login failed" events are definitely different. Maybe two separate @log comments could be added in this case. - How does it relate to more advanced db-parser features like correlation or the self-testing ability of pdbtool of name-value pairs in the <examples> section? (- Isn't it a bit too push-y to call it @pattern instead of @syslog-ng-pattern? :)) However, if such a scheme could be adopted, with some little hacking several cool things could be done (which I'm willing to do): - create "pdbtool sourceextract" or something like that that can generate ready-to-use pattern databases from the source code - this can be added to the build processes to update them along with manpages and similar stuff - Doxygen & co. could be patched to parse and display them - vim, Eclipse and Your Favourite Editor could get a one-click way to add such comment blocks, just as it is possible to add header comments easily What do you think? greets, Peter
Of course, this makes perfect sense if you're an administrator or you're trying to analyze logs from already-written applications, but what if you're the one who writes these applications and you'd like to make them easier to analyze later on?
You sir, are on the right track. When I do web application assessments, the biggest thing I recommend after getting the bugs fixed is to make sure that the business logic can be properly audited and alerted on if necessary.
The idea I've been toying with is defining some kind of a standardized commenting scheme for logging-related calls in source code. Using this, the code could be parsed automatically and a list of log messages could easily be generated that could be used later for documentation or even generating a pattern database.
This is a worthy idea! I think the big problem is that you are asking normal developers to understand how their logs are going to be parsed, so every developer would have to learn the pattern-db format to produce these comments. While laudable, I do not think this is realistic. It would be much more successful if the developers did not have to care what the format would be and only had to specify exactly what fields would be sent. This isn't possible, though, because at this stage, we want the messages to be both human readable and machine parsable at the same time. That requires the developers who are used to writing human-readable messages to know how to parse them, or it would require them sending only machine-parsable messages. To me, spitting out JSON messages accomplishes both, e.g.: { "class":"secevt", "tags": [ "ssh", "system" ], "fields": { "verdict": "accepted" } } I think that is both human readable and is obviously machine parsable. So, I think the real answer is to convince developers to begin logging in a format that can be parsed. Even WELF or CSV would be a major step forward. Otherwise, we're left with what we have now, which is to try to use patternize to divine what the developers meant, because precious few developers would both bother to learn the pattern format and get it right.
On Sun, 2011-03-06 at 11:01 -0600, Martin Holste wrote:
Of course, this makes perfect sense if you're an administrator or you're trying to analyze logs from already-written applications, but what if you're the one who writes these applications and you'd like to make them easier to analyze later on?
You sir, are on the right track. When I do web application assessments, the biggest thing I recommend after getting the bugs fixed is to make sure that the business logic can be properly audited and alerted on if necessary.
The idea I've been toying with is defining some kind of a standardized commenting scheme for logging-related calls in source code. Using this, the code could be parsed automatically and a list of log messages could easily be generated that could be used later for documentation or even generating a pattern database.
This is a worthy idea! I think the big problem is that you are asking normal developers to understand how their logs are going to be parsed, so every developer would have to learn the pattern-db format to produce these comments. While laudable, I do not think this is realistic. It would be much more successful if the developers did not have to care what the format would be and only had to specify exactly what fields would be sent. This isn't possible, though, because at this stage, we want the messages to be both human readable and machine parsable at the same time. That requires the developers who are used to writing human-readable messages to know how to parse them, or it would require them sending only machine-parsable messages. To me, spitting out JSON messages accomplishes both, e.g.: { "class":"secevt", "tags": [ "ssh", "system" ], "fields": { "verdict": "accepted" } } I think that is both human readable and is obviously machine parsable. So, I think the real answer is to convince developers to begin logging in a format that can be parsed. Even WELF or CSV would be a major step forward. Otherwise, we're left with what we have now, which is to try to use patternize to divine what the developers meant, because precious few developers would both bother to learn the pattern format and get it right.
Also, in order to be feasible the taxonomy (e.g. the naming of fields and tags) should be agreed upon, with all the developers out there. That's going to be tough. -- Bazsi
participants (3)
-
Balazs Scheidler
-
Martin Holste
-
Peter Gyongyosi