Re: [syslog-ng] Feedback for GSoC project - RIak Destination for Syslog-ng
Hi! I'm the mentor for the Riak destination for syslog-ng project, please allow me to answer the questions below:
"Fred" == Fred Dushin <fdushin@basho.com> writes:
Fred> As far as I understand, you're talking about a mapping from keys to Fred> sets, but I'm unclear on a few things. The idea is to map a set of log messages to a Riak Set. Where both the key used for the set, and its contents are configurable by the user. There are no plans for a default at this time. There are many ways to configure a syslog-ng=>Riak setup with a destination like the one planned. One is to turn each log message (after parsing) to a Riak Map, and push those maps into a Riak Set. Another way is to format the parsed log messages (with all the extracted fiels, if any) into JSON, and push those into a set. So, for example, given the following syslog line: May 6 14:42:18 eowyn avahi-daemon[27812]: Invalid response packet from host fe80::5d0f:d53a:7b6:3680. We'd end up with a JSON like this: {"timestamp": "2015-05-06T14:42:18+02:00", "host": "eowyn", "program": "avahi-daemon", "pid": 27812, "message": "Invalid response packet from host fe80::5d0f:d53a:7b6:3680.", "avahi-daemon": { "type": "warning", "message": "Invalid response packet", "host": "fe80::5d0f:d53a:7b6:3680" } } We could either add that to a Riak set as-is, or turn it into a Riak map first. Fred> What are the keys you are thinking about? Time stamps? If Fred> timestamps, these are presumably the timestamps of the syslog Fred> event? Whatever the user configures. They may be time stamps (rounded, for predictable keys), or a combination of program name + current date (day granuality). Fred> Just a word of warning, if so. You might find a lot of Fred> variation in timestamp formats and granularity. Perhaps you Fred> can get something reliable out of syslog-ng, We get something sensible out of syslog-ng. But in the end, it is up to the user to configure the template used for keys. There may - and probably will - be examples, but no default. Fred> but that won't help you in the case where syslog-ng is Fred> functioning as a syslog relay, and you want to preserve the Fred> timestamp of the originator, which you should, if you want to Fred> preserve integrity of the logs (e.g, for compliance). In case of syslog-ng, we actually have access to a few kinds of timestamps: the timestamp from the log message (if any), the timestamp of receipt, and the current time. The granularity of timestamps is configurable to some extent. Fred> Or are you talking about a key being a (course grained) Fred> timestamp, say, an integral value in UTC seconds, for example? Fred> And the value(s) being all logs in that interval? Is that your Fred> motivation for sets? That's one way, yes. One could also use something like $PROGRAM/$YEAR-$MONTH-$DAY as key, if the program doesn't produce more than a megabyte of logs a day. So with the example above, our key in case of that log would be avahi-daemon/2015-05-06, and the message would be an element of the set underneath the key. Fred> How much of the syslog payload are you planning to parse? The destination itself is not going to do any parsing. Other parts of syslog-ng do that, and it is up to the user to set up a pipeline that feeds the destination. The source may be syslog, HTTP logs, the Journal, or any of the other sources syslog-ng supports. How much parsing is done, and what gets extracted, is no concern to the destination plugin. Fred> Another interesting problem is that the STRUCTURED-DATA element of Fred> 5424 uses OIDs to discriminate different data types that are encoded Fred> in the header. And while there is a kind of loosely coupled authority Fred> for OIDs, there is no infrastructure for determining a parsing Fred> strategy for these fields. They could really be anything, in the worst Fred> case. As far as I remember, syslog-ng treats all STRUCTURED-DATA elements as strings. But there are tools within syslog-ng to allow converting to other data types, but that must be done explicitly. Fred> But regardless of the deeply structured data, you could get some very Fred> interesting traction by just taking standard headers and indexing them Fred> through Yokozuna. Certainly, indexing the body of a syslog message is Fred> a great idea, as these messages are generally unstructured and fodder Fred> for lucene. This is something that Logstash/ElasticSearch can do Fred> pretty effectively today, and it would be cool to see the same in Riak Fred> + some syslog provider. Yep! When I proposed the idea, using Yokozuna is something I had in mind. Combine the parsing abilities of syslog-ng, Riak for archival purposes, and Yokozuna for searching. That sounds like a match made in heaven. Fred> Finally, it would be really nice if you could structure your plugin in Fred> such a way that they could eventually be ported to rsyslog [2]. The Fred> rsyslogd daemon is deployed by default on certain Linux favors and Fred> enjoys fairly widespread distribution. You might be able to get it Fred> supported in that community, as well. Part of the project is writing a small library to send data to Riak, From C. Just enough for syslog-ng's needs. That library could be used by rsyslog, too (like the MongoDB library originally written for syslog-ng's purposes is used by rsyslog too). But sharing more code than that is not practical, the two daemons work in widely different ways. -- |8]
participants (1)
-
Gergely Nagy