This is an awesome start, and I'm big into patterndb so this is really encouraging. Off the bat, I'd say that it would be more helpful if the <values></values> tags were populated with the .dict values that are being extracted so that you can construct output patterns properly. Along with that, if you have a different name for every .dict value extracted, it becomes labor-intensive to capture them in your output template. I prefer a method in which I have arbitrarily capped the number of values to be extracted to be six strings, six integers. I then label the values I extract as s0-s5 and i0-i5. That way I only need one template for all patterns extracted. Separating the strings and integers makes database insertion easy because my tables then look like <header columns> MSG, pattern_class_id, pattern_rule_id, i0 .. i5, s0 .. s5. Now searching for fields becomes possible if you know what field belongs to what pattern rule ID. I also prefer to have the rule ID's as integers to keep my DB columns smaller. Here's an example for a Cisco FWSM deny and NAT translation teardown messages that I've been using: <ruleset name="FWSM" id='2'> <pattern>%FWSM</pattern> <rules> <rule provider="local" class='2' id='2'> <patterns> <pattern>Deny@QSTRING:i0: @src@QSTRING:s0: :@@IPv4:i1:@/@NUMBER:i2:@ dst@QSTRING:s1: :@@IPv4:i3:@/@NUMBER:i4:@ by access-group @QSTRING:s2:"@</pattern> </patterns> </rule> <rule provider="local" class='3' id='3'> <patterns> <pattern>Teardown@QSTRING:i0: @connection @NUMBER::@ for@QSTRING:s0: :@@IPv4:i1:@/@NUMBER:i2:@ to@QSTRING:s1: :@@IPv4:i3:@/@NUMBER:i4:@ duration@QSTRING:s2: @bytes @NUMBER:i5:@</pattern> </patterns> </rule> </rules> </ruleset> My back-end script does a bit of magic with IPv4 char -> uint parsing for better DB storage. (If anyone at Balabit would like to toss in a little feature for easy outputting as inet_aton/inet_ntoa from socket.h, that would be cool!) So, if I'm looking for all denied packets from IP address 1.1.1.1, I would search my DB where class_id=2 and i1=INET_ATON("1.1.1.1"). Have any others been using db-parser values? Any methods to share? --Martin On Tue, Dec 15, 2009 at 12:20 PM, ILLES, Marton <illes.marton@balabit.hu> wrote:
Hi,
Last week BalaBit made available some 8000 patterns (covering more than 200 applications) for syslog-ng patterndb (or db_parser as you like to call it). The patterns are available under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 (CC by-NC-SA) license. The patterns in their current form are just snapshots of the ongoing effort of providing good quality patterns for various applications. You can download the snapshot of patterns from our website: http://www.balabit.com/downloads/files/patterndb-snapshot/patterndb-20091209...
The patterns are partially hand-crafted and also automatically generated from logfiles and from logcheck regexp based database. Some of the patterns also contains example messages which we are using to automatically test the pattern and syslog-ng's db_parser. You can merge the xml files using "pdbtool merge".
I would also like to setup a public git repository where anyone interested can follow the patterndb development and can submit patterns or fixes. A patterndb website containing all patterndb related information, links, forums, wikis and other useful documentations is under construction as well. Till than the syslog-ng mailing list a good place for questions, ideas and discussions.
As always feedbacks are very welcomed!
Happy parsing!
Marton -- Key fingerprint = F78C 25CA 5F88 6FAF EA21 779D 3279 9F9E 1155 670D
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html