Re: [syslog-ng] [announce] patterndb project

9 Jul 2010

      On Wed, 2010-07-07 at 13:37 +0200, Balazs Scheidler wrote:
...
On Mon, 2010-07-05 at 12:05 -0500, Martin Holste wrote:
...
...
A naive schema based SQL destination would simply create as many tables
as there are schemas. A better optimized one would use the NV -> field
mapping that you propose, and a NoSQL implementation would just scale to
any number of NV pairs without having to rename the fields.
This mapping support would also be useful if we want to generate CEF/CEE
formatted events.
Hm, so maybe we need to decouple the actual DB stuff from the XML
schema and declare it out-of-scope, since its' really up to the
implementer to figure that out, and the specific implementation will
likely change for each setup.  I think what's essential is providing
the list of name-value pairs and whether they are integer or string.
Maybe there could be a "contrib" section on your site with contributed
scripts for stamping out the various configurations (e.g. multi-table
SQL, no-SQL, etc.).
I'd like to create a generic SQL destination, which would magically work
without having to explicitly configure the table schema (e.g. no need to
generate the configuration)
If type information is present then the field names for your condensed
table could be generated on the fly. I think I'd leave this question
opened for a while, until we get that generic SQL destination.
...
...
The problem is that I'd like to support the multiple tables idea as
well, e.g. store each schema in a separate table. In this case you need
a unique id in order to join the tables. Also, if this would be combined
with the MSGID field of RFC5424, this could be used to fetch the
original raw message easily.
It looks to me like MSGID is better suited for a tag then being part
of the ID itself.  From the RFC: "It is intended for filtering
messages on a relay or collector."  A unique ID across multiple tables
is not a problem as long as there is one master table where you would
put the syslog header fields with an auto-increment column to generate
the ID.  If you absolutely wanted Syslog-NG to generate the ID, I
suppose you could append a CRC of the $MSG to the epoch timestamp,
though that isn't foolproof.
Right, I was under the wrong impression what MSGID is. Not that I
understand or agree with the way it was defined though.
Anyway, I wouldn't want to store the syslog message in the database only
to get an ID, and the use of this ID would be optional.
...
...
hmm... hmm, maybe "details" should be above all schemas, e.g instead of
calling it "secevt.details", it should be called "details", it is a
single pattern the extracts all the fields after all, so the pattern
author can decide which information wouldn't fit into any of the schemas
and put that in details.
Yep, I think details would be a good spot for all miscellany, as well
as other meta-data that is inherent to a specific log class that
doesn't fit in a predefined field.
Agreed.
...
...
Well, I believe that in SQL, the best we could probably come up with is
a "list of tags field" and use free-text indexing.
Yes, for instance, the Sphinx full-text search engine has a
Multi-Value Attribute (MVA) config attribute which is specifically
designed for efficiently storing a list of n-number of tag ID's for a
given record.
That's what I thought.
I'm going to update the document with these decisions. Thanks for your
feedback, I really appreciate it.
I've updated the patterndb policy document with the latest discussion
points at

http://git.balabit.hu/

I still have some open points:
  * ruleset and rule IDs (UUID vs something else)
  * ruleset organization

I'd appreciate feedback on the current policy.

-- 
Bazsi