[syslog-ng] [announce] patterndb project

Wed Jul 7 13:37:00 CEST 2010

On Mon, 2010-07-05 at 12:05 -0500, Martin Holste wrote:
> > A naive schema based SQL destination would simply create as many tables
> > as there are schemas. A better optimized one would use the NV -> field
> > mapping that you propose, and a NoSQL implementation would just scale to
> > any number of NV pairs without having to rename the fields.
> >
> > This mapping support would also be useful if we want to generate CEF/CEE
> > formatted events.
> >
> 
> Hm, so maybe we need to decouple the actual DB stuff from the XML
> schema and declare it out-of-scope, since its' really up to the
> implementer to figure that out, and the specific implementation will
> likely change for each setup.  I think what's essential is providing
> the list of name-value pairs and whether they are integer or string.
> Maybe there could be a "contrib" section on your site with contributed
> scripts for stamping out the various configurations (e.g. multi-table
> SQL, no-SQL, etc.).

I'd like to create a generic SQL destination, which would magically work
without having to explicitly configure the table schema (e.g. no need to
generate the configuration)

If type information is present then the field names for your condensed
table could be generated on the fly. I think I'd leave this question
opened for a while, until we get that generic SQL destination.

> 
> > The problem is that I'd like to support the multiple tables idea as
> > well, e.g. store each schema in a separate table. In this case you need
> > a unique id in order to join the tables. Also, if this would be combined
> > with the MSGID field of RFC5424, this could be used to fetch the
> > original raw message easily.
> >
> 
> It looks to me like MSGID is better suited for a tag then being part
> of the ID itself.  From the RFC: "It is intended for filtering
> messages on a relay or collector."  A unique ID across multiple tables
> is not a problem as long as there is one master table where you would
> put the syslog header fields with an auto-increment column to generate
> the ID.  If you absolutely wanted Syslog-NG to generate the ID, I
> suppose you could append a CRC of the $MSG to the epoch timestamp,
> though that isn't foolproof.

Right, I was under the wrong impression what MSGID is. Not that I
understand or agree with the way it was defined though.

Anyway, I wouldn't want to store the syslog message in the database only
to get an ID, and the use of this ID would be optional.

> 
> > hmm... hmm, maybe "details" should be above all schemas, e.g instead of
> > calling it "secevt.details", it should be called "details", it is a
> > single pattern the extracts all the fields after all, so the pattern
> > author can decide which information wouldn't fit into any of the schemas
> > and put that in details.
> >
> 
> Yep, I think details would be a good spot for all miscellany, as well
> as other meta-data that is inherent to a specific log class that
> doesn't fit in a predefined field.

Agreed.

> 
> > Well, I believe that in SQL, the best we could probably come up with is
> > a "list of tags field" and use free-text indexing.
> 
> Yes, for instance, the Sphinx full-text search engine has a
> Multi-Value Attribute (MVA) config attribute which is specifically
> designed for efficiently storing a list of n-number of tag ID's for a
> given record.

That's what I thought.

I'm going to update the document with these decisions. Thanks for your
feedback, I really appreciate it.

-- 
Bazsi