[syslog-ng] filtering vs. keeping all logs
Evan Rempel
erempel at uvic.ca
Thu Apr 28 18:59:27 CEST 2016
Logs are used for so many things. Auditing, security, post incident analysis, live alerting (SIEM) and others. It is for this reason that I believe that all raw log data should be saved.
Adding to the discussion about metadata...
We add metadata from a variety of sources.
1. The syslog line itself. We parse EVERY log message to identify specific data and context. For example, a login identifier is often used in an email address, but in the context of an e-mail address, it is NOT a login identifier. This enables data mining on login identifiers without having to further filer out e-mail messages. We populate hundreds of metadata elements this way. tape volumes, database instances, login, uid, gid, disk drive names, logical volume names, FRU components in hardware
monitoring. The list is huge.
2. Incident details. During the parsing of EVERY log message, specific messages are identified as messages that should be alerted on. Metadata is added that contains incident description, URL to resolution documentation, severity of the incident and details on minimizing false positives. For example, a repeating log message may only be an incident if it repeats at a defined rate over a defined duration. All of this data is used to produce alerts to SMS, email, ticketing system.
3. Inventory management system. We add metadata for tiers of service. We have test, dev, preprod and prod. We also add business application names such as database instance (SID), Facilities management, workflow, MSExchange, listserver etc.
4. Business responsibility matrix. For each host/application there is a group that is responsible for the service. this metadata is added so that when alerts need to be sent the alerting subsystem can determine where to send the alert. It does this based on this responsibility matrix and data from #2.
All of this metadata gets placed into elasticsearch so we can start to mine the data by asking questions like:
- show all of the activity by user XXX in service Y in the preproduction tier on linux hosts.
- show all of the incidents for host HHH that group GGG is responsible for fixing.
- which service is responsible for the large increase in error class syslog lines, and in which tier of service did they occur.
The metadata is the power that drives this, and without the real time high performance pattern matching it just can't be done.
Evan.
On 04/28/2016 06:23 AM, Scot Needy wrote:
> We save all log data and compress/dedup hourly. For an enterprise of about 5000 servers this averages about 200GB.
> Some PCI compartments are special have backup and retention policies for compliance.
>
> Archiving raw log data also gives us data to re-parse should the patterns need to be updated.
>
>
>
>> On Apr 28, 2016, at 7:23 AM, Czanik, Péter <peter.czanik at balabit.com <mailto:peter.czanik at balabit.com>> wrote:
>>
>> Hi,
>>
>> I was asking, because up until now I recall a single syslog-ng user, who told me, that he saves all log messages. On the other hand I keep receiving (marketing) e-mails, that no logs should be discarded, everything should be saved. And sometimes I receive the same feedback from the Big Data world: we have enough disk space, why to do any filtering. So I'd be interested to learn from real world experiences, if filtering is really old fashioned or is there any situation (compliance requirement,
>> endless storage, etc.) when you really save all log messages.
>>
>> Bye,
>>
>> Peter Czanik (CzP) <peter.czanik at balabit.com <mailto:peter.czanik at balabit.com>>
>> Balabit / syslog-ng upstream
>> http://czanik.blogs.balabit.com/
>> https://twitter.com/PCzanik
>>
>> On Thu, Apr 28, 2016 at 11:11 AM, Fabien Wernli <wernli at in2p3.fr <mailto:wernli at in2p3.fr>> wrote:
>>
>> On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote:
>> > One of the major strengths of syslog-ng is message filtering, which
>> > facilitates message routing and discarding useless log messages. OTOH I
>> > often read, that we have now all the technologies and storage to keep all
>> > logs. What do you think?
>>
>> I would go further: we now have the means to add relevant metadata to all the events,
>> which in turn allows us to do targeted archiving.
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
--
Evan Rempel erempel at uvic.ca
Senior Systems Administrator 250.721.7691
Data Centre Services, University Systems, University of Victoria
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20160428/79b63d60/attachment-0001.htm
More information about the syslog-ng
mailing list