Hi, One of the major strengths of syslog-ng is message filtering, which facilitates message routing and discarding useless log messages. OTOH I often read, that we have now all the technologies and storage to keep all logs. What do you think? Bye, Peter Czanik (CzP) <peter.czanik@balabit.com> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik
On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote:
One of the major strengths of syslog-ng is message filtering, which facilitates message routing and discarding useless log messages. OTOH I often read, that we have now all the technologies and storage to keep all logs. What do you think?
I would go further: we now have the means to add relevant metadata to all the events, which in turn allows us to do targeted archiving.
Hi, I was asking, because up until now I recall a single syslog-ng user, who told me, that he saves all log messages. On the other hand I keep receiving (marketing) e-mails, that no logs should be discarded, everything should be saved. And sometimes I receive the same feedback from the Big Data world: we have enough disk space, why to do any filtering. So I'd be interested to learn from real world experiences, if filtering is really old fashioned or is there any situation (compliance requirement, endless storage, etc.) when you really save all log messages. Bye, Peter Czanik (CzP) <peter.czanik@balabit.com> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik On Thu, Apr 28, 2016 at 11:11 AM, Fabien Wernli <wernli@in2p3.fr> wrote:
On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote:
One of the major strengths of syslog-ng is message filtering, which facilitates message routing and discarding useless log messages. OTOH I often read, that we have now all the technologies and storage to keep all logs. What do you think?
I would go further: we now have the means to add relevant metadata to all the events, which in turn allows us to do targeted archiving.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
We save all log data and compress/dedup hourly. For an enterprise of about 5000 servers this averages about 200GB. Some PCI compartments are special have backup and retention policies for compliance. Archiving raw log data also gives us data to re-parse should the patterns need to be updated.
On Apr 28, 2016, at 7:23 AM, Czanik, Péter <peter.czanik@balabit.com> wrote:
Hi,
I was asking, because up until now I recall a single syslog-ng user, who told me, that he saves all log messages. On the other hand I keep receiving (marketing) e-mails, that no logs should be discarded, everything should be saved. And sometimes I receive the same feedback from the Big Data world: we have enough disk space, why to do any filtering. So I'd be interested to learn from real world experiences, if filtering is really old fashioned or is there any situation (compliance requirement, endless storage, etc.) when you really save all log messages.
Bye,
Peter Czanik (CzP) <peter.czanik@balabit.com <mailto:peter.czanik@balabit.com>> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ <http://czanik.blogs.balabit.com/> https://twitter.com/PCzanik <https://twitter.com/PCzanik> On Thu, Apr 28, 2016 at 11:11 AM, Fabien Wernli <wernli@in2p3.fr <mailto:wernli@in2p3.fr>> wrote: On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote:
One of the major strengths of syslog-ng is message filtering, which facilitates message routing and discarding useless log messages. OTOH I often read, that we have now all the technologies and storage to keep all logs. What do you think?
I would go further: we now have the means to add relevant metadata to all the events, which in turn allows us to do targeted archiving.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng <https://lists.balabit.hu/mailman/listinfo/syslog-ng> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng <http://www.balabit.com/support/documentation/?product=syslog-ng> FAQ: http://www.balabit.com/wiki/syslog-ng-faq <http://www.balabit.com/wiki/syslog-ng-faq>
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Logs are used for so many things. Auditing, security, post incident analysis, live alerting (SIEM) and others. It is for this reason that I believe that all raw log data should be saved. Adding to the discussion about metadata... We add metadata from a variety of sources. 1. The syslog line itself. We parse EVERY log message to identify specific data and context. For example, a login identifier is often used in an email address, but in the context of an e-mail address, it is NOT a login identifier. This enables data mining on login identifiers without having to further filer out e-mail messages. We populate hundreds of metadata elements this way. tape volumes, database instances, login, uid, gid, disk drive names, logical volume names, FRU components in hardware monitoring. The list is huge. 2. Incident details. During the parsing of EVERY log message, specific messages are identified as messages that should be alerted on. Metadata is added that contains incident description, URL to resolution documentation, severity of the incident and details on minimizing false positives. For example, a repeating log message may only be an incident if it repeats at a defined rate over a defined duration. All of this data is used to produce alerts to SMS, email, ticketing system. 3. Inventory management system. We add metadata for tiers of service. We have test, dev, preprod and prod. We also add business application names such as database instance (SID), Facilities management, workflow, MSExchange, listserver etc. 4. Business responsibility matrix. For each host/application there is a group that is responsible for the service. this metadata is added so that when alerts need to be sent the alerting subsystem can determine where to send the alert. It does this based on this responsibility matrix and data from #2. All of this metadata gets placed into elasticsearch so we can start to mine the data by asking questions like: - show all of the activity by user XXX in service Y in the preproduction tier on linux hosts. - show all of the incidents for host HHH that group GGG is responsible for fixing. - which service is responsible for the large increase in error class syslog lines, and in which tier of service did they occur. The metadata is the power that drives this, and without the real time high performance pattern matching it just can't be done. Evan. On 04/28/2016 06:23 AM, Scot Needy wrote:
We save all log data and compress/dedup hourly. For an enterprise of about 5000 servers this averages about 200GB. Some PCI compartments are special have backup and retention policies for compliance.
Archiving raw log data also gives us data to re-parse should the patterns need to be updated.
On Apr 28, 2016, at 7:23 AM, Czanik, Péter <peter.czanik@balabit.com <mailto:peter.czanik@balabit.com>> wrote:
Hi,
I was asking, because up until now I recall a single syslog-ng user, who told me, that he saves all log messages. On the other hand I keep receiving (marketing) e-mails, that no logs should be discarded, everything should be saved. And sometimes I receive the same feedback from the Big Data world: we have enough disk space, why to do any filtering. So I'd be interested to learn from real world experiences, if filtering is really old fashioned or is there any situation (compliance requirement, endless storage, etc.) when you really save all log messages.
Bye,
Peter Czanik (CzP) <peter.czanik@balabit.com <mailto:peter.czanik@balabit.com>> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik
On Thu, Apr 28, 2016 at 11:11 AM, Fabien Wernli <wernli@in2p3.fr <mailto:wernli@in2p3.fr>> wrote:
On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote: > One of the major strengths of syslog-ng is message filtering, which > facilitates message routing and discarding useless log messages. OTOH I > often read, that we have now all the technologies and storage to keep all > logs. What do you think?
I would go further: we now have the means to add relevant metadata to all the events, which in turn allows us to do targeted archiving.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria
Hi, First of all: thank you for your feedback. This is very interesting, as it is pretty much the contrary what I hear / read in most discussions. I am often asked how to throw away cron / dhcp / dns / kernel / debug / etc. messages to save bandwidth / disk space and sometimes even to narrow down what is saved from authentication logs (which sounds crazy to my security minded ears...). I wonder what is the reason of this contradiction. Is it the size of the organization? (assumption: a larger org has more resources to save everything) Or is it compliance? (PCI, etc.) Or both? Bye, Peter Czanik (CzP) <peter.czanik@balabit.com> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik On Thu, Apr 28, 2016 at 6:59 PM, Evan Rempel <erempel@uvic.ca> wrote:
Logs are used for so many things. Auditing, security, post incident analysis, live alerting (SIEM) and others. It is for this reason that I believe that all raw log data should be saved.
Adding to the discussion about metadata...
We add metadata from a variety of sources.
1. The syslog line itself. We parse EVERY log message to identify specific data and context. For example, a login identifier is often used in an email address, but in the context of an e-mail address, it is NOT a login identifier. This enables data mining on login identifiers without having to further filer out e-mail messages. We populate hundreds of metadata elements this way. tape volumes, database instances, login, uid, gid, disk drive names, logical volume names, FRU components in hardware monitoring. The list is huge.
2. Incident details. During the parsing of EVERY log message, specific messages are identified as messages that should be alerted on. Metadata is added that contains incident description, URL to resolution documentation, severity of the incident and details on minimizing false positives. For example, a repeating log message may only be an incident if it repeats at a defined rate over a defined duration. All of this data is used to produce alerts to SMS, email, ticketing system.
3. Inventory management system. We add metadata for tiers of service. We have test, dev, preprod and prod. We also add business application names such as database instance (SID), Facilities management, workflow, MSExchange, listserver etc.
4. Business responsibility matrix. For each host/application there is a group that is responsible for the service. this metadata is added so that when alerts need to be sent the alerting subsystem can determine where to send the alert. It does this based on this responsibility matrix and data from #2.
All of this metadata gets placed into elasticsearch so we can start to mine the data by asking questions like:
- show all of the activity by user XXX in service Y in the preproduction tier on linux hosts. - show all of the incidents for host HHH that group GGG is responsible for fixing. - which service is responsible for the large increase in error class syslog lines, and in which tier of service did they occur.
The metadata is the power that drives this, and without the real time high performance pattern matching it just can't be done.
Evan.
On 04/28/2016 06:23 AM, Scot Needy wrote:
We save all log data and compress/dedup hourly. For an enterprise of about 5000 servers this averages about 200GB. Some PCI compartments are special have backup and retention policies for compliance.
Archiving raw log data also gives us data to re-parse should the patterns need to be updated.
On Apr 28, 2016, at 7:23 AM, Czanik, Péter < <peter.czanik@balabit.com> peter.czanik@balabit.com> wrote:
Hi,
I was asking, because up until now I recall a single syslog-ng user, who told me, that he saves all log messages. On the other hand I keep receiving (marketing) e-mails, that no logs should be discarded, everything should be saved. And sometimes I receive the same feedback from the Big Data world: we have enough disk space, why to do any filtering. So I'd be interested to learn from real world experiences, if filtering is really old fashioned or is there any situation (compliance requirement, endless storage, etc.) when you really save all log messages.
Bye,
Peter Czanik (CzP) < <peter.czanik@balabit.com>peter.czanik@balabit.com> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik
On Thu, Apr 28, 2016 at 11:11 AM, Fabien Wernli < <wernli@in2p3.fr> wernli@in2p3.fr> wrote:
On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote:
One of the major strengths of syslog-ng is message filtering, which facilitates message routing and discarding useless log messages. OTOH I often read, that we have now all the technologies and storage to keep all logs. What do you think?
I would go further: we now have the means to add relevant metadata to all the events, which in turn allows us to do targeted archiving.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Evan Rempel erempel@uvic.ca Senior Systems Administrator 250.721.7691 Data Centre Services, University Systems, University of Victoria
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
IT teams that I speak with often use filtering to decrease the log volume, but not in the context of archiving or storing the logs. These teams are typically using some analysis product to process their logs and the product has a licensing cost based on the log volume. For example, splunk or loggly fall into this category. So the use case is to filter out log messages that are know to be of no interest, and feed the rest of the stream to the products for processing. The other use case is where logs are centralized from a large geographical distribution and the cost of the bandwidth to aggregate the logs needs to be minimized. Larger companies may make enough money that they just pay for the log processing. Perhaps at a capacity where some ceiling or cap on price has been reached. They may have development resources to develop a custom log processing architecture so that the processing is just the cost of the CPU time. Larger companies might have their own networks so they are not affected by costs that are associated to bandwidth. Just my $0.02 Evan. On 04/29/2016 01:33 AM, Czanik, Péter wrote:
Hi,
First of all: thank you for your feedback.
This is very interesting, as it is pretty much the contrary what I hear / read in most discussions. I am often asked how to throw away cron / dhcp / dns / kernel / debug / etc. messages to save bandwidth / disk space and sometimes even to narrow down what is saved from authentication logs (which sounds crazy to my security minded ears...).
I wonder what is the reason of this contradiction. Is it the size of the organization? (assumption: a larger org has more resources to save everything) Or is it compliance? (PCI, etc.) Or both?
Bye,
Peter Czanik (CzP) <peter.czanik@balabit.com <mailto:peter.czanik@balabit.com>> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik
On Thu, Apr 28, 2016 at 6:59 PM, Evan Rempel <erempel@uvic.ca <mailto:erempel@uvic.ca>> wrote:
Logs are used for so many things. Auditing, security, post incident analysis, live alerting (SIEM) and others. It is for this reason that I believe that all raw log data should be saved.
Adding to the discussion about metadata...
We add metadata from a variety of sources.
1. The syslog line itself. We parse EVERY log message to identify specific data and context. For example, a login identifier is often used in an email address, but in the context of an e-mail address, it is NOT a login identifier. This enables data mining on login identifiers without having to further filer out e-mail messages. We populate hundreds of metadata elements this way. tape volumes, database instances, login, uid, gid, disk drive names, logical volume names, FRU components in hardware monitoring. The list is huge.
2. Incident details. During the parsing of EVERY log message, specific messages are identified as messages that should be alerted on. Metadata is added that contains incident description, URL to resolution documentation, severity of the incident and details on minimizing false positives. For example, a repeating log message may only be an incident if it repeats at a defined rate over a defined duration. All of this data is used to produce alerts to SMS, email, ticketing system.
3. Inventory management system. We add metadata for tiers of service. We have test, dev, preprod and prod. We also add business application names such as database instance (SID), Facilities management, workflow, MSExchange, listserver etc.
4. Business responsibility matrix. For each host/application there is a group that is responsible for the service. this metadata is added so that when alerts need to be sent the alerting subsystem can determine where to send the alert. It does this based on this responsibility matrix and data from #2.
All of this metadata gets placed into elasticsearch so we can start to mine the data by asking questions like:
- show all of the activity by user XXX in service Y in the preproduction tier on linux hosts. - show all of the incidents for host HHH that group GGG is responsible for fixing. - which service is responsible for the large increase in error class syslog lines, and in which tier of service did they occur.
The metadata is the power that drives this, and without the real time high performance pattern matching it just can't be done.
Evan.
On 04/28/2016 06:23 AM, Scot Needy wrote:
We save all log data and compress/dedup hourly. For an enterprise of about 5000 servers this averages about 200GB. Some PCI compartments are special have backup and retention policies for compliance.
Archiving raw log data also gives us data to re-parse should the patterns need to be updated.
On Apr 28, 2016, at 7:23 AM, Czanik, Péter <peter.czanik@balabit.com <mailto:peter.czanik@balabit.com>> wrote:
Hi,
I was asking, because up until now I recall a single syslog-ng user, who told me, that he saves all log messages. On the other hand I keep receiving (marketing) e-mails, that no logs should be discarded, everything should be saved. And sometimes I receive the same feedback from the Big Data world: we have enough disk space, why to do any filtering. So I'd be interested to learn from real world experiences, if filtering is really old fashioned or is there any situation (compliance requirement, endless storage, etc.) when you really save all log messages.
Bye,
Peter Czanik (CzP) <peter.czanik@balabit.com <mailto:peter.czanik@balabit.com>> Balabit / syslog-ng upstream http://czanik.blogs.balabit.com/ https://twitter.com/PCzanik
On Thu, Apr 28, 2016 at 11:11 AM, Fabien Wernli <wernli@in2p3.fr <mailto:wernli@in2p3.fr>> wrote:
On Thu, Apr 28, 2016 at 11:06:07AM +0200, Czanik, Péter wrote: > One of the major strengths of syslog-ng is message filtering, which > facilitates message routing and discarding useless log messages. OTOH I > often read, that we have now all the technologies and storage to keep all > logs. What do you think?
I would go further: we now have the means to add relevant metadata to all the events, which in turn allows us to do targeted archiving.
In my previous job (where I was much more active on this list), we kept detailed URL and firewall event logs for just 4 days. Long enough to address technical issues even on a long weekend, and no longer. I work with some very large organizations, and even F-100 don't have the resources to keep everything forever. There is also the concept in certain organizations that you only retain data for as long as it is useful and no longer, optimizing the retention policy to discard debug logs quickly, keep "audit trail" logs for exactly 366 days for regulatory compliance, etc. There's also the issue of "Discovery": If you are keeping everything and then you are sued, you need to put a freeze on the data you have and preserve it for delivery to your adversary. Better not to have/keep the data in the first place if it has limited utility. As mentioned,licensing costs definitely come into play. Some clients use syslog-ng as a "prefilter" to discard low-value events before forwarding (spoofing source) to Splunk or Qradar. This is particularly useful when you have appliance-like devices with little or no ability to filter what logs they generate and transmit. Kevin
participants (5)
-
Czanik, Péter
-
Evan Rempel
-
Fabien Wernli
-
Kevin Kadow
-
Scot Needy