<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">IT teams that I speak with often use
filtering to decrease the log volume, but not in the context of
archiving or storing the logs. These teams are typically using
some analysis product to process their logs and the product has a
licensing cost based on the log volume. For example, splunk or
loggly fall into this category. So the use case is to filter out
log messages that are know to be of no interest, and feed the rest
of the stream to the products for processing.<br>
<br>
The other use case is where logs are centralized from a large
geographical distribution and the cost of the bandwidth to
aggregate the logs needs to be minimized.<br>
<br>
Larger companies may make enough money that they just pay for the
log processing. Perhaps at a capacity where some ceiling or cap on
price has been reached. They may have development resources to
develop a custom log processing architecture so that the
processing is just the cost of the CPU time. Larger companies
might have their own networks so they are not affected by costs
that are associated to bandwidth.<br>
<br>
Just my $0.02<br>
<br>
Evan.<br>
<br>
<br>
On 04/29/2016 01:33 AM, Czanik, Péter wrote:<br>
</div>
<blockquote
cite="mid:CANcUavt7UrepUn5W_sNgtO=ZtmCdoev3Ui3ZxUPap-uxxrAN4w@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div>
<div>
<div>Hi,<br>
<br>
</div>
First of all: thank you for your feedback.<br>
<br>
This is very interesting, as it is pretty much the contrary
what I hear / read in most discussions. I am often asked how
to throw away cron / dhcp / dns / kernel / debug / etc.
messages to save bandwidth / disk space and sometimes even
to narrow down what is saved from authentication logs (which
sounds crazy to my security minded ears...).<br>
<br>
</div>
I wonder what is the reason of this contradiction. Is it the
size of the organization? (assumption: a larger org has more
resources to save everything) Or is it compliance? (PCI, etc.)
Or both?<br>
<br>
</div>
Bye,<br>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="gmail_signature">Peter Czanik (CzP) <<a
moz-do-not-send="true"
href="mailto:peter.czanik@balabit.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:peter.czanik@balabit.com">peter.czanik@balabit.com</a></a>><br>
Balabit / syslog-ng upstream<br>
<a moz-do-not-send="true"
href="http://czanik.blogs.balabit.com/" target="_blank">http://czanik.blogs.balabit.com/</a><br>
<a moz-do-not-send="true" href="https://twitter.com/PCzanik"
target="_blank">https://twitter.com/PCzanik</a></div>
</div>
<br>
<div class="gmail_quote">On Thu, Apr 28, 2016 at 6:59 PM, Evan
Rempel <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:erempel@uvic.ca" target="_blank">erempel@uvic.ca</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Logs are used for so many things. Auditing, security,
post incident analysis, live alerting (SIEM) and others.
It is for this reason that I believe that all raw log
data should be saved.<br>
<br>
Adding to the discussion about metadata...<br>
<br>
We add metadata from a variety of sources.<br>
<br>
1. The syslog line itself. We parse EVERY log message to
identify specific data and context. For example, a login
identifier is often used in an email address, but in the
context of an e-mail address, it is NOT a login
identifier. This enables data mining on login
identifiers without having to further filer out e-mail
messages. We populate hundreds of metadata elements this
way. tape volumes, database instances, login, uid, gid,
disk drive names, logical volume names, FRU components
in hardware monitoring. The list is huge.<br>
<br>
2. Incident details. During the parsing of EVERY log
message, specific messages are identified as messages
that should be alerted on. Metadata is added that
contains incident description, URL to resolution
documentation, severity of the incident and details on
minimizing false positives. For example, a repeating log
message may only be an incident if it repeats at a
defined rate over a defined duration. All of this data
is used to produce alerts to SMS, email, ticketing
system.<br>
<br>
3. Inventory management system. We add metadata for
tiers of service. We have test, dev, preprod and prod.
We also add business application names such as database
instance (SID), Facilities management, workflow,
MSExchange, listserver etc.<br>
<br>
4. Business responsibility matrix. For each
host/application there is a group that is responsible
for the service. this metadata is added so that when
alerts need to be sent the alerting subsystem can
determine where to send the alert. It does this based on
this responsibility matrix and data from #2.<br>
<br>
<br>
All of this metadata gets placed into elasticsearch so
we can start to mine the data by asking questions like:<br>
<br>
- show all of the activity by user XXX in service Y in
the preproduction tier on linux hosts.<br>
- show all of the incidents for host HHH that group GGG
is responsible for fixing.<br>
- which service is responsible for the large increase in
error class syslog lines, and in which tier of service
did they occur.<br>
<br>
The metadata is the power that drives this, and without
the real time high performance pattern matching it just
can't be done.<br>
<br>
Evan.
<div>
<div class="h5"><br>
<br>
<br>
On 04/28/2016 06:23 AM, Scot Needy wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">
<div>We save all log data and compress/dedup
hourly. For an enterprise of about 5000 servers
this averages about 200GB. </div>
<div>Some PCI compartments are special have backup
and retention policies for compliance. </div>
<div><br>
</div>
<div>Archiving raw log data also gives us data to
re-parse should the patterns need to be updated. </div>
<div><br>
</div>
<div><br>
</div>
<br>
<div>
<blockquote type="cite">
<div>On Apr 28, 2016, at 7:23 AM, Czanik, Péter
<<a moz-do-not-send="true"
href="mailto:peter.czanik@balabit.com"
target="_blank">peter.czanik@balabit.com</a>>
wrote:</div>
<br>
<div>
<div dir="ltr">
<div>
<div>Hi,<br>
</div>
<br>
I was asking, because up until now I
recall a single syslog-ng user, who told
me, that he saves all log messages. On the
other hand I keep receiving (marketing)
e-mails, that no logs should be discarded,
everything should be saved. And sometimes
I receive the same feedback from the Big
Data world: we have enough disk space, why
to do any filtering. So I'd be interested
to learn from real world experiences, if
filtering is really old fashioned or is
there any situation (compliance
requirement, endless storage, etc.) when
you really save all log messages.<br>
<br>
</div>
Bye,<br>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div>Peter Czanik (CzP) <<a
moz-do-not-send="true"
href="mailto:peter.czanik@balabit.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:peter.czanik@balabit.com">peter.czanik@balabit.com</a></a>><br>
Balabit / syslog-ng upstream<br>
<a moz-do-not-send="true"
href="http://czanik.blogs.balabit.com/"
target="_blank">http://czanik.blogs.balabit.com/</a><br>
<a moz-do-not-send="true"
href="https://twitter.com/PCzanik"
target="_blank">https://twitter.com/PCzanik</a></div>
</div>
<br>
<div class="gmail_quote">On Thu, Apr 28,
2016 at 11:11 AM, Fabien Wernli <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:wernli@in2p3.fr"
target="_blank">wernli@in2p3.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"><span>On
Thu, Apr 28, 2016 at 11:06:07AM +0200,
Czanik, Péter wrote:<br>
> One of the major strengths of
syslog-ng is message filtering, which<br>
> facilitates message routing and
discarding useless log messages. OTOH
I<br>
> often read, that we have now all
the technologies and storage to keep
all<br>
> logs. What do you think?<br>
<br>
</span>I would go further: we now have
the means to add relevant metadata to
all the events,<br>
which in turn allows us to do targeted
archiving.<br>
<br>
<span class="HOEnZb"></span></blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="500">
</pre>
</body>
</html>