[syslog-ng] [announce] patterndb project
Martin Holste
mcholste at gmail.com
Tue Jun 29 17:11:53 CEST 2010
This is awesome. As I've written about previously, I've used the
pattern-db enough to know how powerful and efficient it is, and I am
doing all my logging with it. My main use is for log classification
and field parsing, which normalizes logs down to something that can
easily be put in a database. The classification helps with not only
quickly identifying types of logs, but also higher-level ideas like
log retention (so I archive important logs) and permissions (so people
like web developers can have access to certain logs). The field
parsing is great for things like Snort and firewall logs, as well as
web server logs.
If you use a NoSQL-style database, such as MongoDB or CouchDB, you
don't have to worry about fitting fields into a rigid schema since
there is no concept of "columns." That works out great for pattern-db
because you can specify any field/value pairs in the pattern and then
have Mongo write it as-is so that some records will be (_id:1,
program:"snort", srcip:x.x.x.x} and others will be {_id:2,
program:"sendmail", to_address:"person at example.com"} . They key is
that you don't have to know ahead of time what fields you will be
parsing in order to design a db schema. That means when new patterns
are released, the fields can be named anything without breaking your
schema.
My initial concern with the format of the pattern-db XML is with the
CLSID-style ID's. I understand the advantages of CLSID's, but it is
very expensive to create database indexes on them because of their
enormous length. I would prefer to have an integer ID in the pattern
XML somewhere. Other opinions?
On Fri, Jun 25, 2010 at 10:23 AM, Balazs Scheidler <bazsi at balabit.hu> wrote:
> Hi,
>
> By now probably most of you know about patterndb, a powerful framework
> in syslog-ng that lets you extract structured information from log
> messages and perform classification at a high speed:
>
> http://www.balabit.com/dl/html/syslog-ng-ose-v3.1-guide-admin-en.html/concepts_pattern_databases.html
>
> Until now, syslog-ng offered the feature, but no release-quality
> patterns were produced by the syslog-ng developers. Some samples based
> on the logcheck database were created, but otherwise every syslog-ng
> user had to create her samples manually, possibly repeating work
> performed by others.
>
> Since this calls out to be a community project, I'm hereby starting one.
>
> Goals
> =====
>
> Create release-quality pattern databases that can simply be deployed to
> an existing syslog-ng installation. The goal of the patterns is to
> extract structured information from the free-form syslog messages, e.g.
> create name-value pairs based on the syslog message.
>
> Since the key factor when doing something like this is the naming of
> fields, we're going to create our generic naming guidelines that can be
> applied to any application in the industry.
>
> It is not our goal to implement correllation or any other advanced form
> of analysis, although we feel that with the results of this project,
> event correllation and analysis can be performed much easier than
> without it.
>
> Related projects
> ================
>
> I know there are other efforts in the field, why not simply join them?
>
> CEF - is the log message format for a proprietary log analysis engine,
> primarily meant to be used to hold IP security device logs (firewalls,
> IPSs, virus gateways etc). The patterndb project aims to create patterns
> for a wider range of device logs and be more generic in the approach. On
> the other hand we feel that it might be useful to create a solution for
> converting db-parser output to the CEF format.
>
> CEE - Common Event Expression project by Mitre has a focus on creating a
> nv pair dictionary for all kinds of devices/log messages out there.
> Although I might be missing something, but I didn't find the concrete
> results so far, apart from a nicely looking white paper. If the CEE
> delivers something, then patterndb would probably adapt the
> naming/taxonomy structure. But I guess not all devices will start
> logging in the new shiny format, thus the existing devices would need
> their logs converted, so the patterndb work wouldn't be wasted.
>
> Infrastructure
> ==============
>
> Our original patterndb related plans were to create an easy to use web
> based interface for editing patterns, but since that project is
> progressing slowly, I'm calling for a minimalist approach: git based
> version control of simple plain text files. Of course once the nice web
> based interface is finished, we're going to be ready to use it.
>
> First steps
> ===========
>
> I have created a git repository at:
>
> http://git.balabit.hu/bazsi/syslog-ng-patterndb.git
>
> This contains the initial version of the naming policy document and a
> simple schema for SIEM-style and a user login-logout naming schema.
>
> If you are interested please read the file README.txt in the git
> archive, or if you prefer a web browser, use this link:
>
> http://git.balabit.hu/?p=bazsi/syslog-ng-patterndb.git;a=blob;f=README.txt;h=9bbfeaead0c21dcf6171e12e311ae8612f572bfc;hb=6061e22221a72d35238b35f82b04afd436341b5c
>
> Licensing
> =========
>
> I do not have a decision yet, but for sure this is going to use one of
> the open source licenses or Creative Commons. Let me know if you have a
> preference in this area.
>
> Getting involved
> ================
>
> Join the syslog-ng mailing list, a start discussing! If you have
> existing patterns, great. If you don't, it is not late to join.
>
> http://lists.balabit.hu/mailman/listinfo/syslog-ng
>
>
> --
> Bazsi
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
More information about the syslog-ng
mailing list