[syslog-ng] Pattern extraction
Martin Holste
mcholste at gmail.com
Sat Aug 14 17:02:36 CEST 2010
If you're looking to do never-wrong, full normalization, then yes,
you're looking at thousands of signatures. However, if you're looking
to extract some common fields, it's actually not that much work to
grab things like IP addresses using regexp. Since regexp is slow, I'm
thinking about writing some generic patterns that would match on IP's
using the fast pattern matcher. I don't know if it'll work, but it
would look like "@ANYSTRING@@IPv4@@ANYSTRING@" and then maybe another
one to grep out two IP's, then another for three, etc. I have no idea
if that will work; we'll see how it goes.
I think that the pursuit of perfection in this field will be
discouraging, and may stifle efforts before they begin. I urge you to
take it one pattern at a time. Sure, we may need thousands of
patterns, but there are hundreds if not thousands on this mailing
list. Everybody take two patterns ;) And don't forget that the
patternize tool may be able to help by heuristically identifying
fields in messages. Then it just comes down to a human naming the
fields instead of painstakingly writing the patterns themselves.
Something else to consider: Even if you're only extracting the RFC
headers of the syslog but you have full-text search abilities of the
log messages, you can make some OLAP-style basic dimensional analysis
happen. So, let's say you're going through router logs looking for an
OSPF adjacency change. You search for "LOADING to FULL" and then
group by host. You've just magically discovered all of the routers
that flapped during whatever incident caused the adjacency change.
Obviously this is very basic, but don't underestimate the immediate
value of being able to quickly pinpoint which hosts had which events
occur. I would say that 70% of the total value you'd get from having
all messages perfectly parsed is already attained just by being able
to do free text searches and group by host.
Lastly, not all logs are created equal! I wrote parsers for Cisco
firewall connection teardowns and firewall denies, and now more than
half of my logs are neatly parsed. That's because the vast majority
of Cisco logs at notification level are build/teardown messages.
(Something like four logs per flow per device). Now if I'm looking
for something weird, I can easily take the majority of the hay out of
the haystack by excluding the already classified logs in my search.
It even helps with reporting, because a big jump in the number of
unclassified messages shows up on the radar.
So to sum up, the benefit of creating log patterns is exponential.
Not having a pattern for every possible log isn't really a big deal,
but having patterns for certain logs is.
On Fri, Aug 13, 2010 at 8:00 PM, Anton Chuvakin <anton at chuvakin.org> wrote:
>> So, I must extract hundreds of pattern manually. :(
>
> Not really hundreds, try tens of thousands. If you sit and watch a
> busy syslog server for, say, 5 years, some say you'd see a few
> thousand or more of unique messages. Personally, I have not tried it,
> but I trust the source.
>
>
>> Regards
>>
>> --- On Fri, 13/8/10, Anton Chuvakin <anton at chuvakin.org> wrote:
>>
>> From: Anton Chuvakin <anton at chuvakin.org>
>> Subject: Re: [syslog-ng] Pattern extraction
>> To: "Syslog-ng users' and developers' mailing list" <syslog-ng at lists.balabit.hu>
>> Date: Friday, 13 August, 2010, 7:18 PM
>>
>> > I dont know how can i extract pattern form logs, I must check every log type separately?, using pattern recognition methods? or using
>> >pattern database (if exist for all aplication and device)?
>>
>> Well, this is not just you - it is "you and the rest of the world."
>> The standard way is pretty much to manually (or with tools - but still
>> mostly manually) write regular expressions for every distinct log
>> message type.
>>
>> --
>> Dr. Anton Chuvakin
>> Site: http://www.chuvakin.org
>> Blog: http://www.securitywarrior.org
>> LinkedIn: http://www.linkedin.com/in/chuvakin
>> Consulting: http://www.securitywarriorconsulting.com
>> Twitter: @anton_chuvakin
>> Google Voice: +1-510-771-7106
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>>
>
>
>
> --
> Dr. Anton Chuvakin
> Site: http://www.chuvakin.org
> Blog: http://www.securitywarrior.org
> LinkedIn: http://www.linkedin.com/in/chuvakin
> Consulting: http://www.securitywarriorconsulting.com
> Twitter: @anton_chuvakin
> Google Voice: +1-510-771-7106
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
More information about the syslog-ng
mailing list