[syslog-ng] [announce] patterndb project

Sun Jul 4 22:58:23 CEST 2010

Hi,

On Sun, 2010-07-04 at 11:27 -0500, Martin Holste wrote:
> I prefer the dot notation just because it's what I'm used to.
> However, an XML schema could represent this as repeated child
> elements, like:
> <rule><class>Net</class><class>NAT</class><class>Security</class></rule>.
>  A user would see these three classes listed and know that the
> respective required fields exist as name/value pairs within the
> pattern.  Likewise, an author would only be able to put class="Net" if
> his or her pattern does in fact provide name/value extractions for the
> "Net" tuple.  That provides the guidance needed for deciding how to
> classify the patterns.  I'm not sure if there would be any effective
> difference between a "class" element and the existing tag element, so
> maybe it's just a matter of stipulating that contributers need to
> appropriately tag their signatures with the correct classes inherent
> within them.

Maybe I'm missing something, but as I see the current "tags" function
which is present in patterndb v3 (e.g. syslog-ng OSE 3.1 or later) is
exactly what you describe.

Since an example is worth thousand words, here is an untested pattern,
covering an SSH login event, converting values into the proposed usracct
schema:

<rule id="..." class="system">
   <patterns>
     <pattern>Accepted @STRING:usracct.authmethod@ for @STRING:usracct.username@ from @IPv4:temp.src_ip@ port @NUMBER:temp.src_port@ @STRING:usracct.service@</pattern>
   </patterns>
   <values>
     <value name="usracct.type">login</value>
     <value name="usracct.sessionid">$PID</value>
     <value name="usracct.application">$PROGRAM</value>
     <value name="usracct.device">${temp.src_ip}:${temp_src_port}</value>
   </values>
   <tags>
      <tag>usracct</tag>
    </tags>
</rule>

If I understand you correctly, you were referring to the "class"
attribute of the rule element, and extend that. The way I think is that the 
"tags" feature is far superior than using classes, maybe a deprecation of
the class attribute is would be needed. For example, a theoretical v4 format:

<rule id="...">
   <patterns>
     <pattern>Accepted @STRING:usracct.authmethod@ for @STRING:usracct.username@ from @IPv4:temp.src_ip@ port @NUMBER:temp.src_port@ @STRING:usracct.service@</pattern>
   </patterns>
   <values>
     <value name="usracct.type">login</value>
     <value name="usracct.sessionid">$PID</value>
     <value name="usracct.application">$PROGRAM</value>
     <value name="usracct.device">${temp.src_ip}:${temp_src_port}</value>
   </values>
   <tags>
      <tag>usracct</tag>
      <!--- here's the only change, the class attribute became a specially named tag -->
      <tag>class.system</tag>
    </tags>
</rule>

But anyway, the idea of splitting complex schemas into smaller, but combinable 
elements is a great idea. Splitting the current "secevt" schema to three 
separate schemas: Net, Security and NAT and let the user combine them if 
needed sounds good.

Example:

<rule id="...">
   <patterns>
     <pattern>... packet filter log, with NAT and verdict </pattern>
   </patterns>
   <values>
     ...
   </values>
   <tags>
      <tag>Net</tag>
      <tag>NAT</tag>
      <tag>Security</tag>
    </tags>
</rule>

But this is already possible with v3.1. The only problem with using three tags 
instead of one, is how to store the extracted information in a way that it can 
be combined later.

The logical method for storing tagged data with a set of NV pairs is to put them
in a properly structured SQL table. E.g. with the the three tags above, you'd get
3 tables: one for the Net fields, one for the NAT and the other for Security, which
makes a problem obvious: it is one message after all, and quite possibly when you
want to create a report you'd need to query the database with the following
question:

* please give me records that have all three tags, with all of their fields combined.

E.g. if these are indeed stored in 3 tables, you have to join them, possibly using
a unique message identifier. For example:

SELECT * FROM Net, NAT, Security WHERE Net.MSGID=NAT.MSGID AND Net.MSGID=Security.MSGID;

And voila, you have your log message. Of course using a non-SQL database could
make this even simpler, or by using a handcrafted sql() destination, you could 
put all these fields in the same table. (my aim is to create a generic SQL 
destination, in which case you don't have to care how tables are laid out)

The only missing bit here is that right now syslog-ng is unable to generate a 
unique message ID on its own, but that's not very difficult to add.

What do you think? Based on this idea, I'm proposing to split the current 
secevt schema into 3 smaller ones: flowevt, natevt and secevt.

Please check the git archive where I've pushed the current version.

Also, if this is something we can agree on, I'll add some information about 
this into the "policy" document.

> 
> In fact, it probably wouldn't be hard at all to write a quick script
> to auto-tag signatures as they are submitted, based on the name/value
> pairs provided in the signature.  So the only real thing a contributer
> would need to be aware of would be the official terms to use for the
> names, e.g. standardizing on "srcport" versus "source_port."
> 
> So, that means that the community would be responsible for:
> 1. Creating a standard list of names to use, adhering to the data type
> contained within (strings, ints, etc.).

yes.

> 2. Create a convention for which names are required (and optional) for
> which classes or tags.

yes. but please also note that CEE is doing something similar, but in
the absence of anything concrete, I'd start using our own set of
name-value pairs and in case the CEE is producing something, it'd be a
simple search-and-replace to use the "official" names.

> 3. Maintain the officially approved and vetted list of signatures that
> adhere to the above conventions.

yes.

> 
> This is basically what you've already stated you want to do, right?
> 
> One of the nice things about XML is that you can create schema
> definition files (XSD's) which can validate a given XML file.  So, the
> output of the naming conventions could be an XSD file that can be
> distributed with Syslog-NG so that end users can quickly verify
> signatures before they submit them.

There's one such schema in the syslog-ng source tree, in the directory
doc/xsd right now.

-- 
Bazsi