<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">IT teams that I speak with often use

      filtering to decrease the log volume, but not in the context of

      archiving or storing the logs. These teams are typically using

      some analysis product to process their logs and the product has a

      licensing cost based on the log volume. For example, splunk or

      loggly fall into this category. So the use case is to filter out

      log messages that are know to be of no interest, and feed the rest

      of the stream to the products for processing.<br>

      <br>

      The other use case is where logs are centralized from a large

      geographical distribution and the cost of the bandwidth to

      aggregate the logs needs to be minimized.<br>

      <br>

      Larger companies may make enough money that they just pay for the

      log processing. Perhaps at a capacity where some ceiling or cap on

      price has been reached. They may have development resources to

      develop a custom log processing architecture so that the

      processing is just the cost of the CPU time. Larger companies

      might have their own networks so they are not affected by costs

      that are associated to bandwidth.<br>

      <br>

      Just my $0.02<br>

      <br>

      Evan.<br>

      <br>

      <br>

      On 04/29/2016 01:33 AM, Czanik, Péter wrote:<br>

    </div>

    <blockquote

cite="mid:CANcUavt7UrepUn5W_sNgtO=ZtmCdoev3Ui3ZxUPap-uxxrAN4w@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div>

          <div>

            <div>Hi,<br>

              <br>

            </div>

            First of all: thank you for your feedback.<br>

            <br>

            This is very interesting, as it is pretty much the contrary

            what I hear / read in most discussions. I am often asked how

            to throw away cron / dhcp / dns / kernel / debug / etc.

            messages to save bandwidth / disk space and sometimes even

            to narrow down what is saved from authentication logs (which

            sounds crazy to my security minded ears...).<br>

            <br>

          </div>

          I wonder what is the reason of this contradiction. Is it the

          size of the organization? (assumption: a larger org has more

          resources to save everything) Or is it compliance? (PCI, etc.)

          Or both?<br>

          <br>

        </div>

        Bye,<br>

      </div>

      <div class="gmail_extra"><br clear="all">

        <div>

          <div class="gmail_signature">Peter Czanik (CzP) &lt;<a

              moz-do-not-send="true"

              href="mailto:peter.czanik@balabit.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:peter.czanik@balabit.com">peter.czanik@balabit.com</a></a>&gt;<br>

            Balabit / syslog-ng upstream<br>

            <a moz-do-not-send="true"

              href="http://czanik.blogs.balabit.com/" target="_blank">http://czanik.blogs.balabit.com/</a><br>

            <a moz-do-not-send="true" href="https://twitter.com/PCzanik"

              target="_blank">https://twitter.com/PCzanik</a></div>

        </div>

        <br>

        <div class="gmail_quote">On Thu, Apr 28, 2016 at 6:59 PM, Evan

          Rempel <span dir="ltr">&lt;<a moz-do-not-send="true"

              href="mailto:erempel@uvic.ca" target="_blank">erempel@uvic.ca</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div text="#000000" bgcolor="#FFFFFF">

              <div>Logs are used for so many things. Auditing, security,

                post incident analysis, live alerting (SIEM) and others.

                It is for this reason that I believe that all raw log

                data should be saved.<br>

                <br>

                Adding to the discussion about metadata...<br>

                <br>

                We add metadata from a variety of sources.<br>

                <br>

                1. The syslog line itself. We parse EVERY log message to

                identify specific data and context. For example, a login

                identifier is often used in an email address, but in the

                context of an e-mail address, it is NOT a login

                identifier. This enables data mining on login

                identifiers without having to further filer out e-mail

                messages. We populate hundreds of metadata elements this

                way. tape volumes, database instances, login, uid, gid,

                disk drive names, logical volume names, FRU components

                in hardware monitoring. The list is huge.<br>

                <br>

                2. Incident details. During the parsing of EVERY log

                message, specific messages are identified as messages

                that should be alerted on. Metadata is added that

                contains incident description, URL to resolution

                documentation, severity of the incident and details on

                minimizing false positives. For example, a repeating log

                message may only be an incident if it repeats at a

                defined rate over a defined duration. All of this data

                is used to produce alerts to SMS, email, ticketing

                system.<br>

                <br>

                3. Inventory management system. We add metadata for

                tiers of service. We have test, dev, preprod and prod.

                We also add business application names such as database

                instance (SID), Facilities management, workflow,

                MSExchange, listserver etc.<br>

                <br>

                4. Business responsibility matrix. For each

                host/application there is a group that is responsible

                for the service. this metadata is added so that when

                alerts need to be sent the alerting subsystem can

                determine where to send the alert. It does this based on

                this responsibility matrix and data from #2.<br>

                <br>

                <br>

                All of this metadata gets placed into elasticsearch so

                we can start to mine the data by asking questions like:<br>

                <br>

                - show all of the activity by user XXX in service Y in

                the preproduction tier on linux hosts.<br>

                - show all of the incidents for host HHH that group GGG

                is responsible for fixing.<br>

                - which service is responsible for the large increase in

                error class syslog lines, and in which tier of service

                did they occur.<br>

                <br>

                The metadata is the power that drives this, and without

                the real time high performance pattern matching it just

                can't be done.<br>

                <br>

                Evan.

                <div>

                  <div class="h5"><br>

                    <br>

                    <br>

                    On 04/28/2016 06:23 AM, Scot Needy wrote:<br>

                  </div>

                </div>

              </div>

              <div>

                <div class="h5">

                  <blockquote type="cite">

                    <div>We save all log data and compress/dedup

                      hourly.  For an enterprise of about 5000 servers

                      this averages about 200GB. </div>

                    <div>Some PCI compartments are special have backup

                      and retention policies for compliance. </div>

                    <div><br>

                    </div>

                    <div>Archiving raw log data also gives us data to

                      re-parse should the patterns need to be updated.  </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <br>

                    <div>

                      <blockquote type="cite">

                        <div>On Apr 28, 2016, at 7:23 AM, Czanik, Péter

                          &lt;<a moz-do-not-send="true"

                            href="mailto:peter.czanik@balabit.com"

                            target="_blank">peter.czanik@balabit.com</a>&gt;

                          wrote:</div>

                        <br>

                        <div>

                          <div dir="ltr">

                            <div>

                              <div>Hi,<br>

                              </div>

                              <br>

                              I was asking, because up until now I

                              recall a single syslog-ng user, who told

                              me, that he saves all log messages. On the

                              other hand I keep receiving (marketing)

                              e-mails, that no logs should be discarded,

                              everything should be saved. And sometimes

                              I receive the same feedback from the Big

                              Data world: we have enough disk space, why

                              to do any filtering. So I'd be interested

                              to learn from real world experiences, if

                              filtering is really old fashioned or is

                              there any situation (compliance

                              requirement, endless storage, etc.) when

                              you really save all log messages.<br>

                              <br>

                            </div>

                            Bye,<br>

                          </div>

                          <div class="gmail_extra"><br clear="all">

                            <div>

                              <div>Peter Czanik (CzP) &lt;<a

                                  moz-do-not-send="true"

                                  href="mailto:peter.czanik@balabit.com"

                                  target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:peter.czanik@balabit.com">peter.czanik@balabit.com</a></a>&gt;<br>

                                Balabit / syslog-ng upstream<br>

                                <a moz-do-not-send="true"

                                  href="http://czanik.blogs.balabit.com/"

                                  target="_blank">http://czanik.blogs.balabit.com/</a><br>

                                <a moz-do-not-send="true"

                                  href="https://twitter.com/PCzanik"

                                  target="_blank">https://twitter.com/PCzanik</a></div>

                            </div>

                            <br>

                            <div class="gmail_quote">On Thu, Apr 28,

                              2016 at 11:11 AM, Fabien Wernli <span

                                dir="ltr">&lt;<a moz-do-not-send="true"

                                  href="mailto:wernli@in2p3.fr"

                                  target="_blank">wernli@in2p3.fr</a>&gt;</span>

                              wrote:<br>

                              <blockquote class="gmail_quote"

                                style="margin:0 0 0 .8ex;border-left:1px

                                #ccc solid;padding-left:1ex"><span>On

                                  Thu, Apr 28, 2016 at 11:06:07AM +0200,

                                  Czanik, Péter wrote:<br>

                                  &gt; One of the major strengths of

                                  syslog-ng is message filtering, which<br>

                                  &gt; facilitates message routing and

                                  discarding useless log messages. OTOH

                                  I<br>

                                  &gt; often read, that we have now all

                                  the technologies and storage to keep

                                  all<br>

                                  &gt; logs. What do you think?<br>

                                  <br>

                                </span>I would go further: we now have

                                the means to add relevant metadata to

                                all the events,<br>

                                which in turn allows us to do targeted

                                archiving.<br>

                                <br>

                                <span class="HOEnZb"></span></blockquote>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="500">

</pre>

  </body>

</html>