[syslog-ng] Pattern Database first snapshot available

Fri Dec 18 18:13:12 CET 2009

> The <values></values> could be use to specify additional values which
> you want to set, but do not appear in the message itself. For example if
> you want to classify login messages, but for a certain message the
> username does not appear, but you know that this message reports a
> specific username. This case you can use the <values> to assign
> the .dict.username variable (for example) to that specific user and
> latter you can be sure that it exists.
>
> I am still not sure if I completely understand your suggestion...
>

Oh right, I completely forgot that you added the values system after
the tag system and that the empty <values/> tags were to indicate that
no values were being added.

> The reason for using UUID was to have the ability to provide global
> unique ids, simple integers would be hard to maintain. I was also
> thinking using OIDs for IDs, but UUID was an easier choice. Technically
> you can use simple integers or any other string as syslog-ng currently
> does not check it. I will think about it... :)
>

Yes, that is true that the UUID would be easier for global community
purposes, it's just an awfully large value to be storing as
per-message overhead.

> Using integers would be also better because of DB indexing purposes. If
> you want to use integers, you can than assign a <value
> name="my_id">42</value> as a work-around to each pattern and latter use
> "my_id" in your templates.

That's a good idea and would probably fit my needs just fine.

> I prefer using more meaning-full names as this way you can normalize
> your logs, so that it wont matter if it is a PIX, iptables etc. log
> message, you can always refer to the source/destination address with
> it's name. It requires to store different type of logs in different
> tables, but you can freely change your application without changing your
> log processing scripts.
>

If you are doing multiple tables then it is most certainly better to
normalize the names as you've done.  My app is large enough that I was
concerned with open file limits in the database with too many
different tables.  Specifically, if you are logging 1000 possible
classes, each with their own output variables, then you would need
1000 tables x table rotation (if any).  On MySQL, this means 3 x
number of tables files open, for at least 3000 files open.  If the DB
and the OS can handle the number of files open, you still incur a fair
amount of overhead when a query accesses a table not in the open table
cache.  Additionally, it might make the client code much more
difficult to write because you have variable column names.  I suppose
it wouldn't be too bad to have a directory lookup for what the column
names are to dynamically build your SQL, but I was trying to simplify
the database as much as possible at the expense of making the patterns
a bit more complex.

As a side note, your method would be much more appropriate for
inserting into emerging hash-style databases like TokuDB, Hypertable,
TokyoCabinet, and MongoDB, or even document-based databases like
CouchDB.  The problem with such methods currently is the insertion
rate is fairly slow for a busy syslog server (anything over 10k
messages/sec).

> You can also combine these to methods to use meaningful names in
> patterns and using <values> you can assign to numbered values, like
> this:
>
> <value name="s1">${.dict.source_ip}</value>
>

This is an excellent idea and I will probably move to something like it.

> Of course it would require a bit more memory and CPU cycles. Of course
> you are free to name your values as you want. I think it is really a
> question on the patterns we try to build and distribute. Maybe I can add
> a rewrite mechanism to pdbtool which would rename the pattern names to
> numbered value names. So this way we can publish patterns with
> meaningful names and anyone can latter rename the patterns for numbered
> names. Would it fit your needs?
>

I think the most valuable designation to put in the published pattern
is a string or int XML attribute or element.  Then users can decide
how they want to handle them and optimize their storage schema
accordingly.

> I have also had some plan to store parsed values as different type of
> data and not always as string. IP addresses, numbers are a very good
> candidate for this. I put it on my todo list. :)
>

Awesome!

Thanks for all of your hard work on this.