Hi Fabien - Correct - I am trying your Perl module. What I would like to do is: 1) have the syslog-ng servers run patterndb to parse different log types (makes that pattern matching scale over multiple servers) 2) send directly to an ES cluster I was thinking maybe a "broker" like redis or RabitMQ might add buffering for performance but hoping it would not be necessary What I have working is this: destination d_redis { redis ( host("localhost") command("LPUSH", "logstash", "$(format-json proxy_time=${PROXY.TIME} proxy_s_ip=${PROXY.S_IP} proxy_c_ip=${PROXY.C_IP} proxy_cs_mthd=${PROXY.CS_METHOD} proxy_s_action=${PROXY.S_ACTION} proxy_cs_host=${PROXY.CS_HOST} proxy_cs_uri_port=${PROXY.CS_URI_PORT} proxy_cs_username=${PROXY.CS_USERNAME} proxy_user_agent=${PROXY.USER_AGENT} proxy_cs_categories=${PROXY.CS_CATEGORIES})\n") ); }; with logstash simply pulling from redis and feeding ES. Are you saying I would not need to use the format-json bit? If so - how would I select/name the desired fields that were parsed with patterndb? As far as overall performance - I really think it is a combination of disk I/O and memory starvation. I see a spike in "majflt/s" around the time the performance goes down hitting around 100 - 200 I also see a *lot* of reads *and* writes which could be the paging... Anyway - I think I could scale (out) the ES across multiple nodes once I get the syslog-ng indexing to json part working well. Could you help me grok how to specify the fields to your Perl mod? (otherwise I might have to read the source :-( Thanks!! Jim ---- Fabien Wernli <wernli@in2p3.fr> wrote:
Hi,
On Wed, Oct 22, 2014 at 09:28:23PM -0400, Jim Hendrick wrote:
First of all - I'm glad to see more of us working on this.
I second that. We should have a common repository to share our efforts, as I know the incubator team is very busy, we could as well help them take the right decisions.
scripts. I have done some basic testing and it looks like the Lua one has more features, but I am having library issues with it so I may try to use the Perl module and try to add some of these features (e.g. template() is missing in the current Elasticsearch.pm so using that to format-json seems out of the question at the moment)
If you're referring to my implementationi [1], the reason template() is missing, is that you actually don't need it, as the perl module passes a perl structure with all the key-values from `scope()` to the queue callback.
As for the performance, I start to get drops at around 5k/s, and I have a 6-node ES cluster with pretty decent hardware. I suspect the bottleneck to be my syslog_ng server which is a virtual machine.
My opinions/findings so far:
1) the lua destination is very nice, but lua IMHO lacks a decent Elasticsearch lib, and you have to format name-value pairs as json 2) the perl dest is nice as it gets the name-value pairs natively as perl structures, and CPAN has an awesome ES module [2] we're using it in production 3) python seems great too, and python has from what I hear a nice ES module it also gets the name-value pairs as a python dictionary it would be great if someone could test it 4) the last "official" option is using the SCL block from the incubator, which basically is a shell program destination, so I didn't even consider it for obvious performance reasons 5) other upcoming option: java destination in the works (which would obviously benefit from ES' native libs)
Admittedly ES already takes json as input, so wether it's the destination handling the serialization or syslog's json parser is probably not so much of an issue, as long as it doesn't need to be munged in your destination code.
Cheers
[1] https://github.com/faxm0dem/syslog_ng-elasticsearch [2] http://search.cpan.org/~drtech/Search-Elasticsearch
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq