[syslog-ng] RFC: Applying transformations to a whole log message
Gergely Nagy
algernon at balabit.hu
Thu May 10 11:08:13 CEST 2012
Hi!
In the GeoIP thread[1], I started to play with the idea of introducing
another way to modify messages.
So far, we have rewrite, which can set new values associated with a
message, or change existing ones - one at a time.
We also have template functions, which one can use how a specific value
will be formatted. Again, pretty much one at a time.
What syslog-ng lacks right now, is a way to apply a transformation to
a message as a whole, a transformation that will take effect right
there, right then, instead of making a modified copy like value-pairs()
does. (value-pairs() also suffers from the problem that to be useful, it
needs explicit support elsewhere: among the template functions, or
within the destination driver).
What I wish for, is to be able to apply any number of transformation
functions to a whole LogMessage. Whether the transformations rewire the
key names, or change values, I'd love to be able to just tell syslog-ng,
that "here, take this message, go out and prosper, make it better,
whatever it takes!" - and it would do just that.
To give a few examples, I'd love to be able to do any and all of the
following:
* Ask syslog-ng to take a message, and look up every IP address
associated with it (for simplicity's sake, lets assume every such
address is stored within a key that ends with "_IP", and no other keys
end with the same suffix) in a GeoIP database, and put the result in
keys that have a "geo." prefix, followed by the original key name.
* Ask syslog-ng to take a message, and rename the keys according to
various rules I set - similar how value-pairs()' rekeying works,
possibly following the same syntax. For example, I want to take all
".json.*" keys, remove the prefix, and uppercase the names. Then, I
want to replace all leading dots with an underscore in whatever keys
remain.
* Ask syslog-ng to remove keys completely. I don't care about the DATE
field, because I receive CEE-enabled messages only, and they come with
a high-precision date field anyway, called a "timestamp".
* I also want to drop every key where the value matches a certain
pattern. Or perhaps not drop them, but anonymize the value..
For example, I might not like the word "plasson", so much so, that
whatever key contains it, I never want to see it.
I also want to pull a prank, because it just happens to be April 1st,
so I want to replace every occurence of "Linux" with "Emacs" within a
LogMessage, in every single key.
* Since we're applying transformations, might aswell do what rewrite
does too, and be able to set stuff - we already do subst.
For example, I want to anonymize all the IP addresses. I don't mind
the country-codes exposed, but I don't want the IPs in my logs.
* I want to be able to compose all of the above, chain them together, so
one gets executed after the other, and in the end, the LogMessage will
end up with their combined result.
For the above, I propose the following syntax:
,----
| map m_do_stuff {
| geoip("*_IP", target-prefix("geo."))
| rekey(".json.*",
| shift(6) uppercase())
| rekey(".*", replace(".", "_"))
| filter-out(key("DATE"))
| filter-out(value("plasson", type(substring)))
| subst(value("Linux", "Emacs", type(substring)))
| set(key("*_IP", "<anonymized>"))
| };
`----
Of course, this differs a bit from the syntax used in rewrite, and to be
honest, intentionally so. I could never learn to love rewrite's way of
set("new-value", value("key-name")). Nevertheless, the syntax can be
changed to be similar to rewrite, the functionality would remain the
same even then.
And how to use this?
destination d_something { source(s_something); map(m_do_stuff); ... }
And in case we want to tie it to a condition, then:
map(m_do_stuff, condition(filter(f_filter_condition)))
I don't think I'd want to support specifying maps in-line, but I suppose
that could be done aswell.
Basically, this would be rewrite on steroids, with the ability to modify
keys and values in bulk aswell. The major advantage this would have is
the ability to work not only on a single key, but apply transformations
to any number of key-value pairs, changing either of them or both.
If architectured well, it could even be fast, on par with rewrite if it
has to do similar things. I mean, the following two should be equally
fast:
,----
| map m_set_host {
| set(key("HOST", "myhost"));
| };
`----
,----
| rewrite r_set_host {
| set("myhost", value("HOST"));
| };
`----
For this to work, and for optimisations to be made possible, the
implementation will have to be clever, and able to take shortcuts. I
have a few ideas about that too, but that's a topic for a later time:
let's see first if the idea is deemed useful, and if the syntax I came
up with makes sense to anyone else.
What do YOU think? Would you have a use for a way to configure bulk
transformations? If so, what other transformations would you find
interesting?
--
|8]
More information about the syslog-ng
mailing list