<div dir="ltr"><div><div><div>Hi,<br><br></div>First of all: thank you for your feedback.<br><br>This is very interesting, as it is pretty much the contrary what I hear / read in most discussions. I am often asked how to throw away cron / dhcp / dns / kernel / debug / etc. messages to save bandwidth / disk space and sometimes even to narrow down what is saved from authentication logs (which sounds crazy to my security minded ears...).<br><br></div>I wonder what is the reason of this contradiction. Is it the size of the organization? (assumption: a larger org has more resources to save everything) Or is it compliance? (PCI, etc.) Or both?<br><br></div>Bye,<br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature">Peter Czanik (CzP) <<a href="mailto:peter.czanik@balabit.com" target="_blank">peter.czanik@balabit.com</a>><br>Balabit / syslog-ng upstream<br><a href="http://czanik.blogs.balabit.com/" target="_blank">http://czanik.blogs.balabit.com/</a><br><a href="https://twitter.com/PCzanik" target="_blank">https://twitter.com/PCzanik</a></div></div>
<br><div class="gmail_quote">On Thu, Apr 28, 2016 at 6:59 PM, Evan Rempel <span dir="ltr"><<a href="mailto:erempel@uvic.ca" target="_blank">erempel@uvic.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Logs are used for so many things.
Auditing, security, post incident analysis, live alerting (SIEM)
and others. It is for this reason that I believe that all raw log
data should be saved.<br>
<br>
Adding to the discussion about metadata...<br>
<br>
We add metadata from a variety of sources.<br>
<br>
1. The syslog line itself. We parse EVERY log message to identify
specific data and context. For example, a login identifier is
often used in an email address, but in the context of an e-mail
address, it is NOT a login identifier. This enables data mining on
login identifiers without having to further filer out e-mail
messages. We populate hundreds of metadata elements this way. tape
volumes, database instances, login, uid, gid, disk drive names,
logical volume names, FRU components in hardware monitoring. The
list is huge.<br>
<br>
2. Incident details. During the parsing of EVERY log message,
specific messages are identified as messages that should be
alerted on. Metadata is added that contains incident description,
URL to resolution documentation, severity of the incident and
details on minimizing false positives. For example, a repeating
log message may only be an incident if it repeats at a defined
rate over a defined duration. All of this data is used to produce
alerts to SMS, email, ticketing system.<br>
<br>
3. Inventory management system. We add metadata for tiers of
service. We have test, dev, preprod and prod. We also add business
application names such as database instance (SID), Facilities
management, workflow, MSExchange, listserver etc.<br>
<br>
4. Business responsibility matrix. For each host/application there
is a group that is responsible for the service. this metadata is
added so that when alerts need to be sent the alerting subsystem
can determine where to send the alert. It does this based on this
responsibility matrix and data from #2.<br>
<br>
<br>
All of this metadata gets placed into elasticsearch so we can
start to mine the data by asking questions like:<br>
<br>
- show all of the activity by user XXX in service Y in the
preproduction tier on linux hosts.<br>
- show all of the incidents for host HHH that group GGG is
responsible for fixing.<br>
- which service is responsible for the large increase in error
class syslog lines, and in which tier of service did they occur.<br>
<br>
The metadata is the power that drives this, and without the real
time high performance pattern matching it just can't be done.<br>
<br>
Evan.<div><div class="h5"><br>
<br>
<br>
On 04/28/2016 06:23 AM, Scot Needy wrote:<br>
</div></div></div><div><div class="h5">
<blockquote type="cite">
<div>We save all log data and compress/dedup hourly. For
an enterprise of about 5000 servers this averages about 200GB. </div>
<div>Some PCI compartments are special have backup and
retention policies for compliance. </div>
<div><br>
</div>
<div>Archiving raw log data also gives us data to
re-parse should the patterns need to be updated. </div>
<div><br>
</div>
<div><br>
</div>
<br>
<div>
<blockquote type="cite">
<div>On Apr 28, 2016, at 7:23 AM, Czanik, Péter <<a href="mailto:peter.czanik@balabit.com" target="_blank"></a><a href="mailto:peter.czanik@balabit.com" target="_blank">peter.czanik@balabit.com</a>>
wrote:</div>
<br>
<div>
<div dir="ltr">
<div>
<div>Hi,<br>
</div>
<br>
I was asking, because up until now I recall a single
syslog-ng user, who told me, that he saves all log
messages. On the other hand I keep receiving (marketing)
e-mails, that no logs should be discarded, everything
should be saved. And sometimes I receive the same
feedback from the Big Data world: we have enough disk
space, why to do any filtering. So I'd be interested to
learn from real world experiences, if filtering is
really old fashioned or is there any situation
(compliance requirement, endless storage, etc.) when you
really save all log messages.<br>
<br>
</div>
Bye,<br>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div>Peter Czanik (CzP) <<a href="mailto:peter.czanik@balabit.com" target="_blank"></a><a href="mailto:peter.czanik@balabit.com" target="_blank">peter.czanik@balabit.com</a>><br>
Balabit / syslog-ng upstream<br>
<a href="http://czanik.blogs.balabit.com/" target="_blank">http://czanik.blogs.balabit.com/</a><br>
<a href="https://twitter.com/PCzanik" target="_blank">https://twitter.com/PCzanik</a></div>
</div>
<br>
<div class="gmail_quote">On Thu, Apr 28, 2016 at 11:11 AM,
Fabien Wernli <span dir="ltr"><<a href="mailto:wernli@in2p3.fr" target="_blank"></a><a href="mailto:wernli@in2p3.fr" target="_blank">wernli@in2p3.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Thu, Apr 28, 2016 at 11:06:07AM +0200,
Czanik, Péter wrote:<br>
> One of the major strengths of syslog-ng is
message filtering, which<br>
> facilitates message routing and discarding
useless log messages. OTOH I<br>
> often read, that we have now all the
technologies and storage to keep all<br>
> logs. What do you think?<br>
<br>
</span>I would go further: we now have the means to
add relevant metadata to all the events,<br>
which in turn allows us to do targeted archiving.<br>
<br>
<br>
______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" rel="noreferrer" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" rel="noreferrer" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" rel="noreferrer" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
<br>
</blockquote>
</div>
<br>
</div>
______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
</div>
</blockquote>
</div>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>______________________________________________________________________________
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a>
</pre>
</blockquote>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888"><pre cols="500">--
Evan Rempel <a href="mailto:erempel@uvic.ca" target="_blank">erempel@uvic.ca</a>
Senior Systems Administrator <a href="tel:250.721.7691" value="+12507217691" target="_blank">250.721.7691</a>
Data Centre Services, University Systems, University of Victoria
</pre>
</font></span></div>
<br>______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" rel="noreferrer" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" rel="noreferrer" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" rel="noreferrer" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
<br></blockquote></div><br></div>