code-commenting log messages

6 Mar 2011

      Hi,

most of the pattern database-related projects I've seen here started off
from the log side of the whole issue: investigating the messages an
application produces and trying to create patterns based on them
(manually or via patternize). Of course, this makes perfect sense if
you're an administrator or you're trying to analyze logs from
already-written applications, but what if you're the one who writes
these applications and you'd like to make them easier to analyze later
on? What if you're the extremely nice guy who wants to ship an
up-to-date pattern database for syslog-ng along with your application?
:) Or, to be more realistic, at least a list of the most important log
messages your program can emit that the user can search to find out what
an actual log message means.

The idea I've been toying with is defining some kind of a standardized
commenting scheme for logging-related calls in source code. Using this,
the code could be parsed automatically and a list of log messages could
easily be generated that could be used later for documentation or even
generating a pattern database.

I've done multiple rounds of investigation but I only found custom,
application-specific solutions for this. So on one hand, I'm asking you
if you've seen any widespread commenting schemes for log messages that
can fulfil this task? Or even a custom, application-specific solution
that is so incredibly cool that we should all adopt it instead of
reinventing the wheel? :)

Until then, I'm proposing a commenting scheme I think would be usable
and I'm asking for your feedback. It is based on Javadoc, which is
widespread enough to be familiar for the majority of developers and has
mutations available for virtually all programming languages. I'm going
to show the concept through the now-standard example of the logging of
the login in sshd:

        /**
         * @log
         *
         * Signals that the authentication has succeeded or failed for a
user.
         *
         * @class system
         * @tags useracct, secevt
         * @url http://www.openssh.com
         *
         * @example Accepted password for bazsi from 127.0.0.1 port
48650 ssh2
         * @example Failed password for bazsi from 127.0.1.1 port 44637 ssh2
         *
         * @pattern Accepted @ESTRING:usracct.authmethod: @for
@ESTRING:usracct.username: @from @ESTRING:usracct.device: @port
@ESTRING:: @@ANYSTRING:usracct.service@
         * @pattern Failed @ESTRING:usracct.authmethod: @for
@ESTRING:usracct.username: @from @ESTRING:usracct.device: @port
@ESTRING:: @@ANYSTRING:usracct.service@
         *
         * @regexp (Accepted|Failed)
(gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam|hostbased)
for [^[:space:]]+ from [^[:space:]]+ port [[:digit:]]+( (ssh|ssh2))?$
         */

        authlog("%s %s for %s%.100s from %.200s port %d%s",
            authmsg,
            method,
            authctxt->valid ? "" : "invalid user ",
            authctxt->user,
            get_remote_ipaddr(),
            get_remote_port(),
            info);

To formalize it a bit:

- comments for the log messages should follow the standard
Javadoc/Doxygen formatting, with the main indentifier token being "@log"
(or \log if you prefer, all further parameters should be usable in the
backslash form as well)
- the allowed parameters are @class, @url, @example, @regexp, @pattern
and @tags
- multiple @example, @pattern and @regexp fields are allowed
- it is mandatory to specify either @example, @pattern or @regexp to
give the application some kind of an identifier, all other params can be
omitted

I have my own concerns, though:

- Is it even the right thing to do to spam source code like this? Is it
the proper place for the semantic description of log messages? Sure,
it's the easiest place to add such things while the app is being
developed and if the comment's right there in the code it's more likely
that any changes will be followed up during an update of the code, but
still, it seems a bit of an overkill even for me now that I look at
it... But maybe it'd worth it for the most important messages.
- Maybe and @id would be useful to allow for manually specifying an
identifier for the log messages (we all love Oracle's error codes, don't
we? :))
- What should be done if a single log() call can emit multiple different
messages with different meanings that should be getting their own
separate entry? The example above is the perfect candidate for this: the
"login accepted" and "login failed" events are definitely different.
Maybe two separate @log comments could be added in this case.
- How does it relate to more advanced db-parser features like
correlation or the self-testing ability of pdbtool of name-value pairs
in the <examples> section?
(- Isn't it a bit too push-y to call it @pattern instead of
@syslog-ng-pattern? :))

However, if such a scheme could be adopted, with some little hacking
several cool things could be done (which I'm willing to do):

- create "pdbtool sourceextract" or something like that that can
generate ready-to-use pattern databases from the source code
- this can be added to the build processes to update them along with
manpages and similar stuff
- Doxygen & co. could be patched to parse and display them
- vim, Eclipse and Your Favourite Editor could get a one-click way to
add such comment blocks, just as it is possible to add header comments
easily

What do you think?

greets,
Peter

Peter Gyongyosi

Martin Holste

Balazs Scheidler

tags

participants (3)