[syslog-ng] RFC: Applying transformations to a whole log message

Martin Holste mcholste at gmail.com
Thu May 10 17:11:55 CEST 2012


This is definitely something that's needed, but I'm a bit concerned
with the complexity.  I want to propose another idea, which is just
off the top of my head:  What if something like the program()
destination can be used to do the message transformations so that your
favorite script or C program can be used inline as a log preprocessor
as well as a destination.  The reason I think this could be helpful is
that then you can re-use utility scripts and code you already have
laying around without having to learn the new system.  Granted, in a
lot of cases, the proposed built-in system would be fairly
straightforward, but for advanced usage, like tying in with external
databases, it could be very helpful to have the ability to offload the
transforming to an arbitrary script or program.  I think the challenge
would be with latency and potential queue clogging, but that can be
managed.

On Thu, May 10, 2012 at 4:08 AM, Gergely Nagy <algernon at balabit.hu> wrote:
> Hi!
>
> In the GeoIP thread[1], I started to play with the idea of introducing
> another way to modify messages.
>
> So far, we have rewrite, which can set new values associated with a
> message, or change existing ones - one at a time.
>
> We also have template functions, which one can use how a specific value
> will be formatted. Again, pretty much one at a time.
>
> What syslog-ng lacks right now, is a way to apply a transformation to
> a message as a whole, a transformation that will take effect right
> there, right then, instead of making a modified copy like value-pairs()
> does. (value-pairs() also suffers from the problem that to be useful, it
> needs explicit support elsewhere: among the template functions, or
> within the destination driver).
>
> What I wish for, is to be able to apply any number of transformation
> functions to a whole LogMessage. Whether the transformations rewire the
> key names, or change values, I'd love to be able to just tell syslog-ng,
> that "here, take this message, go out and prosper, make it better,
> whatever it takes!" - and it would do just that.
>
> To give a few examples, I'd love to be able to do any and all of the
> following:
>
> * Ask syslog-ng to take a message, and look up every IP address
>  associated with it (for simplicity's sake, lets assume every such
>  address is stored within a key that ends with "_IP", and no other keys
>  end with the same suffix) in a GeoIP database, and put the result in
>  keys that have a "geo." prefix, followed by the original key name.
>
> * Ask syslog-ng to take a message, and rename the keys according to
>  various rules I set - similar how value-pairs()' rekeying works,
>  possibly following the same syntax. For example, I want to take all
>  ".json.*" keys, remove the prefix, and uppercase the names. Then, I
>  want to replace all leading dots with an underscore in whatever keys
>  remain.
>
> * Ask syslog-ng to remove keys completely. I don't care about the DATE
>  field, because I receive CEE-enabled messages only, and they come with
>  a high-precision date field anyway, called a "timestamp".
>
> * I also want to drop every key where the value matches a certain
>  pattern. Or perhaps not drop them, but anonymize the value..
>
>  For example, I might not like the word "plasson", so much so, that
>  whatever key contains it, I never want to see it.
>
>  I also want to pull a prank, because it just happens to be April 1st,
>  so I want to replace every occurence of "Linux" with "Emacs" within a
>  LogMessage, in every single key.
>
> * Since we're applying transformations, might aswell do what rewrite
>  does too, and be able to set stuff - we already do subst.
>
>  For example, I want to anonymize all the IP addresses. I don't mind
>  the country-codes exposed, but I don't want the IPs in my logs.
>
> * I want to be able to compose all of the above, chain them together, so
>  one gets executed after the other, and in the end, the LogMessage will
>  end up with their combined result.
>
> For the above, I propose the following syntax:
>
> ,----
> | map m_do_stuff {
> |  geoip("*_IP", target-prefix("geo."))
> |  rekey(".json.*",
> |        shift(6) uppercase())
> |  rekey(".*", replace(".", "_"))
> |  filter-out(key("DATE"))
> |  filter-out(value("plasson", type(substring)))
> |  subst(value("Linux", "Emacs", type(substring)))
> |  set(key("*_IP", "<anonymized>"))
> | };
> `----
>
> Of course, this differs a bit from the syntax used in rewrite, and to be
> honest, intentionally so. I could never learn to love rewrite's way of
> set("new-value", value("key-name")). Nevertheless, the syntax can be
> changed to be similar to rewrite, the functionality would remain the
> same even then.
>
> And how to use this?
>
> destination d_something { source(s_something); map(m_do_stuff); ... }
>
> And in case we want to tie it to a condition, then:
>  map(m_do_stuff, condition(filter(f_filter_condition)))
>
> I don't think I'd want to support specifying maps in-line, but I suppose
> that could be done aswell.
>
> Basically, this would be rewrite on steroids, with the ability to modify
> keys and values in bulk aswell. The major advantage this would have is
> the ability to work not only on a single key, but apply transformations
> to any number of key-value pairs, changing either of them or both.
>
> If architectured well, it could even be fast, on par with rewrite if it
> has to do similar things. I mean, the following two should be equally
> fast:
>
> ,----
> | map m_set_host {
> |   set(key("HOST", "myhost"));
> | };
> `----
>
> ,----
> | rewrite r_set_host {
> |   set("myhost", value("HOST"));
> | };
> `----
>
> For this to work, and for optimisations to be made possible, the
> implementation will have to be clever, and able to take shortcuts. I
> have a few ideas about that too, but that's a topic for a later time:
> let's see first if the idea is deemed useful, and if the syntax I came
> up with makes sense to anyone else.
>
> What do YOU think? Would you have a use for a way to configure bulk
> transformations? If so, what other transformations would you find
> interesting?
>
> --
> |8]
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>


More information about the syslog-ng mailing list