[syslog-ng] [announce] patterndb project

Mon Jul 5 19:05:19 CEST 2010

> A naive schema based SQL destination would simply create as many tables
> as there are schemas. A better optimized one would use the NV -> field
> mapping that you propose, and a NoSQL implementation would just scale to
> any number of NV pairs without having to rename the fields.
>
> This mapping support would also be useful if we want to generate CEF/CEE
> formatted events.
>

Hm, so maybe we need to decouple the actual DB stuff from the XML
schema and declare it out-of-scope, since its' really up to the
implementer to figure that out, and the specific implementation will
likely change for each setup.  I think what's essential is providing
the list of name-value pairs and whether they are integer or string.
Maybe there could be a "contrib" section on your site with contributed
scripts for stamping out the various configurations (e.g. multi-table
SQL, no-SQL, etc.).

> The problem is that I'd like to support the multiple tables idea as
> well, e.g. store each schema in a separate table. In this case you need
> a unique id in order to join the tables. Also, if this would be combined
> with the MSGID field of RFC5424, this could be used to fetch the
> original raw message easily.
>

It looks to me like MSGID is better suited for a tag then being part
of the ID itself.  From the RFC: "It is intended for filtering
messages on a relay or collector."  A unique ID across multiple tables
is not a problem as long as there is one master table where you would
put the syslog header fields with an auto-increment column to generate
the ID.  If you absolutely wanted Syslog-NG to generate the ID, I
suppose you could append a CRC of the $MSG to the epoch timestamp,
though that isn't foolproof.

> hmm... hmm, maybe "details" should be above all schemas, e.g instead of
> calling it "secevt.details", it should be called "details", it is a
> single pattern the extracts all the fields after all, so the pattern
> author can decide which information wouldn't fit into any of the schemas
> and put that in details.
>

Yep, I think details would be a good spot for all miscellany, as well
as other meta-data that is inherent to a specific log class that
doesn't fit in a predefined field.

> Well, I believe that in SQL, the best we could probably come up with is
> a "list of tags field" and use free-text indexing.

Yes, for instance, the Sphinx full-text search engine has a
Multi-Value Attribute (MVA) config attribute which is specifically
designed for efficiently storing a list of n-number of tag ID's for a
given record.