[syslog-ng] [announce] patterndb project

Balazs Scheidler bazsi at balabit.hu
Fri Jul 9 13:26:25 CEST 2010


On Wed, 2010-07-07 at 13:37 +0200, Balazs Scheidler wrote:
> On Mon, 2010-07-05 at 12:05 -0500, Martin Holste wrote:
> > > A naive schema based SQL destination would simply create as many tables
> > > as there are schemas. A better optimized one would use the NV -> field
> > > mapping that you propose, and a NoSQL implementation would just scale to
> > > any number of NV pairs without having to rename the fields.
> > >
> > > This mapping support would also be useful if we want to generate CEF/CEE
> > > formatted events.
> > >
> > 
> > Hm, so maybe we need to decouple the actual DB stuff from the XML
> > schema and declare it out-of-scope, since its' really up to the
> > implementer to figure that out, and the specific implementation will
> > likely change for each setup.  I think what's essential is providing
> > the list of name-value pairs and whether they are integer or string.
> > Maybe there could be a "contrib" section on your site with contributed
> > scripts for stamping out the various configurations (e.g. multi-table
> > SQL, no-SQL, etc.).
> 
> I'd like to create a generic SQL destination, which would magically work
> without having to explicitly configure the table schema (e.g. no need to
> generate the configuration)
> 
> If type information is present then the field names for your condensed
> table could be generated on the fly. I think I'd leave this question
> opened for a while, until we get that generic SQL destination.
> 
> > 
> > > The problem is that I'd like to support the multiple tables idea as
> > > well, e.g. store each schema in a separate table. In this case you need
> > > a unique id in order to join the tables. Also, if this would be combined
> > > with the MSGID field of RFC5424, this could be used to fetch the
> > > original raw message easily.
> > >
> > 
> > It looks to me like MSGID is better suited for a tag then being part
> > of the ID itself.  From the RFC: "It is intended for filtering
> > messages on a relay or collector."  A unique ID across multiple tables
> > is not a problem as long as there is one master table where you would
> > put the syslog header fields with an auto-increment column to generate
> > the ID.  If you absolutely wanted Syslog-NG to generate the ID, I
> > suppose you could append a CRC of the $MSG to the epoch timestamp,
> > though that isn't foolproof.
> 
> Right, I was under the wrong impression what MSGID is. Not that I
> understand or agree with the way it was defined though.
> 
> Anyway, I wouldn't want to store the syslog message in the database only
> to get an ID, and the use of this ID would be optional.
> 
> > 
> > > hmm... hmm, maybe "details" should be above all schemas, e.g instead of
> > > calling it "secevt.details", it should be called "details", it is a
> > > single pattern the extracts all the fields after all, so the pattern
> > > author can decide which information wouldn't fit into any of the schemas
> > > and put that in details.
> > >
> > 
> > Yep, I think details would be a good spot for all miscellany, as well
> > as other meta-data that is inherent to a specific log class that
> > doesn't fit in a predefined field.
> 
> Agreed.
> 
> > 
> > > Well, I believe that in SQL, the best we could probably come up with is
> > > a "list of tags field" and use free-text indexing.
> > 
> > Yes, for instance, the Sphinx full-text search engine has a
> > Multi-Value Attribute (MVA) config attribute which is specifically
> > designed for efficiently storing a list of n-number of tag ID's for a
> > given record.
> 
> That's what I thought.
> 
> I'm going to update the document with these decisions. Thanks for your
> feedback, I really appreciate it.


I've updated the patterndb policy document with the latest discussion
points at

http://git.balabit.hu/

I still have some open points:
  * ruleset and rule IDs (UUID vs something else)
  * ruleset organization

I'd appreciate feedback on the current policy.

-- 
Bazsi




More information about the syslog-ng mailing list