[syslog-ng] [announce] patterndb project

Sat Jul 10 21:56:58 CEST 2010

Looking good.  One picky thing: the line containing "NV pair names
should only contain alphanumeric characters (a-zA-Z0-9)" should maybe
include the underscore and dot in the regexp to avoid confusion, or at
least the underscore.

Also, I think "generic" may not be the term you're looking for when
describing your initial schema design.  To me, "per-schema tables"
better describes the layout, as technically, my method of dumping all
logs into one table is more "generic" in that it's a one-size-fits-all
table setup.

I'm noting that it's a bit difficult to discuss the patterndb schema
and DB layouts because I keep wanting to refer to DB schemas, which is
confusing.  Could we instead call the patterndb schemas "rule sets,"
as per the original patterndb.xml, instead of schemas?  That way we
know when discussing schemas that it can only refer to DB tables.  It
is more clear to me to say "one type of schema is to have one table
per rule set."

On Fri, Jul 9, 2010 at 6:26 AM, Balazs Scheidler <bazsi at balabit.hu> wrote:
> On Wed, 2010-07-07 at 13:37 +0200, Balazs Scheidler wrote:
>> On Mon, 2010-07-05 at 12:05 -0500, Martin Holste wrote:
>> > > A naive schema based SQL destination would simply create as many tables
>> > > as there are schemas. A better optimized one would use the NV -> field
>> > > mapping that you propose, and a NoSQL implementation would just scale to
>> > > any number of NV pairs without having to rename the fields.
>> > >
>> > > This mapping support would also be useful if we want to generate CEF/CEE
>> > > formatted events.
>> > >
>> >
>> > Hm, so maybe we need to decouple the actual DB stuff from the XML
>> > schema and declare it out-of-scope, since its' really up to the
>> > implementer to figure that out, and the specific implementation will
>> > likely change for each setup.  I think what's essential is providing
>> > the list of name-value pairs and whether they are integer or string.
>> > Maybe there could be a "contrib" section on your site with contributed
>> > scripts for stamping out the various configurations (e.g. multi-table
>> > SQL, no-SQL, etc.).
>>
>> I'd like to create a generic SQL destination, which would magically work
>> without having to explicitly configure the table schema (e.g. no need to
>> generate the configuration)
>>
>> If type information is present then the field names for your condensed
>> table could be generated on the fly. I think I'd leave this question
>> opened for a while, until we get that generic SQL destination.
>>
>> >
>> > > The problem is that I'd like to support the multiple tables idea as
>> > > well, e.g. store each schema in a separate table. In this case you need
>> > > a unique id in order to join the tables. Also, if this would be combined
>> > > with the MSGID field of RFC5424, this could be used to fetch the
>> > > original raw message easily.
>> > >
>> >
>> > It looks to me like MSGID is better suited for a tag then being part
>> > of the ID itself.  From the RFC: "It is intended for filtering
>> > messages on a relay or collector."  A unique ID across multiple tables
>> > is not a problem as long as there is one master table where you would
>> > put the syslog header fields with an auto-increment column to generate
>> > the ID.  If you absolutely wanted Syslog-NG to generate the ID, I
>> > suppose you could append a CRC of the $MSG to the epoch timestamp,
>> > though that isn't foolproof.
>>
>> Right, I was under the wrong impression what MSGID is. Not that I
>> understand or agree with the way it was defined though.
>>
>> Anyway, I wouldn't want to store the syslog message in the database only
>> to get an ID, and the use of this ID would be optional.
>>
>> >
>> > > hmm... hmm, maybe "details" should be above all schemas, e.g instead of
>> > > calling it "secevt.details", it should be called "details", it is a
>> > > single pattern the extracts all the fields after all, so the pattern
>> > > author can decide which information wouldn't fit into any of the schemas
>> > > and put that in details.
>> > >
>> >
>> > Yep, I think details would be a good spot for all miscellany, as well
>> > as other meta-data that is inherent to a specific log class that
>> > doesn't fit in a predefined field.
>>
>> Agreed.
>>
>> >
>> > > Well, I believe that in SQL, the best we could probably come up with is
>> > > a "list of tags field" and use free-text indexing.
>> >
>> > Yes, for instance, the Sphinx full-text search engine has a
>> > Multi-Value Attribute (MVA) config attribute which is specifically
>> > designed for efficiently storing a list of n-number of tag ID's for a
>> > given record.
>>
>> That's what I thought.
>>
>> I'm going to update the document with these decisions. Thanks for your
>> feedback, I really appreciate it.
>
>
> I've updated the patterndb policy document with the latest discussion
> points at
>
> http://git.balabit.hu/
>
> I still have some open points:
>  * ruleset and rule IDs (UUID vs something else)
>  * ruleset organization
>
> I'd appreciate feedback on the current policy.
>
> --
> Bazsi
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>