[syslog-ng] Elasticsearch destination

Wed Oct 29 14:50:10 CET 2014

Thank you sir!

  At least this is not unique to my testing (not sure that's actually
*good* news :-)

I will try and reproduce some comparable baselines using a couple setups
I have tried:

1) proxy-syslog --> syslog-ng --> redis --> logstash+grok --> logstash
--> elasticsearch
    This was essentially following a basic set of instructions just to
make sure I could reproduce them.

2) proxy-syslog --> syslog-ng+patterndb+format-json --> redis -->
logstash --> elasticsearch
    This moved the pattern matching and conversion to json out to the
edge, leaving redis & logstash since they worked well at feeding
elasticsearch.

3) proxy-syslog --> syslog-ng+patterndb+Elasticsearch.pm --> elasticsearch
    This seemed the simplest & most promising.

I have not tried all three with the same load, so I cannot definitively
say one is better, but my subjective feel is that #3 was actually the
slowest. I suspect something with the way the data is being sent to
elasticsearch but I do not know whether it is an issue with the perl
module itself or somehow in the way the data is being sent to
elasticsearch (indexing, etc.)

My overall thought is (still) that parsing at each syslog-ng server with
no middleman should be fastest, since as you scale to more syslog-ng
servers you are distributing the pattern matching load.

I am still not sure if a broker (redis, rabbitmq, etc.) will help as
long as elasticsearch can accept the data fast enough.

Thanks for the feedback - I will certainly post whatever I come up with
in the next day or so.

Jim

On 10/29/2014 09:29 AM, Fabien Wernli wrote:
> Hi Jim,
>
> On Tue, Oct 28, 2014 at 04:36:19PM -0400, Jim Hendrick wrote:
>> Now the issue is performance. I am sending roughly ~5000 EPS to the
>> syslog-ng instance running patterndb, but only able to "sustain" less than
>> 1000 to elasticsearch (oddly, ES seems to start receiving at ~5000 EPS, and
>> within an hour or less, drops to ~1000)
> I've got a similar workload, and seeing drops too.
> When EPS is below 2k/s, usually syslog-ng copes. When it goes above, I can
> see drops. Enabling flow-control seems to help from the syslog-ng
> perspective (no drops in `syslog-ng-ctl stats`) but when I look at protocol
> counters in the Linux kernel, the drops can be seen as "InErrors" (I'm using
> UDP). I'm a little lost when trying to interpret the effect of syslog-ng
> tuneables.
>
>> I have tried a number of things, including running a second ES node and
>> letting syslog-ng "round robin" with no luck at all.
> We're doing that by specifying the `nodes` key in Elasticsearch.pm:
> according to its documentation [1] this should ensure Search::Elasticsearch 
> makes use of load-balancing. This seems to work as intended, when checking
> the bandwidth between syslog-ng and all ES nodeS.
>
> When looking at the statistics of my nodes, they seem to be hitting no
> bottleneck whatsoever:
>
> * load is between 0 and 2 (8 cores total)
> * writes average around 50/s with peaks around 150  (6+P RAID 10k SAS)
> * reads are ridiculous
> * heap usage is around 75% (of 24g)
> * interface rx ~500k/s
> * elasticsearch index rate ~500/s
>
>> ES tuning has included locking 16G of memory per ES instance, and setting
>> indices.memory.index_buffer_size: 50%
> We're using 'index_buffer_size: 30%' and 'ES_HEAP_SIZE=24g' on our 6 ES
> nodes. max_size is 256 in syslog-ng/Elasticsearch.pm
>
> What we're currently doing (or planning) to try to investigate:
>
> 1. micro-benchmark the CPAN module to see if we can go above 2k/s
> 2. improve the statistics gathered by collectd-elasticsearch [2]
> 3. write a dummy ES server which only does some
>    accounting but throws data away, in order to do some benchmarking.
> 4. compare python, lua and perl implementations
> 5. tune various syslog-ng parameters
> 6. use some MQ implementation between ES and syslog-ng
> 7. use TCP instead of UDP for incoming syslog
>
> I realize this won't help you much, but may be of interest so we can channel
> our common research. I'll be meeting with some syslog-ng experts very soon,
> and I am convinced I'll come back with many options to improve the
> situation.
>
> Cheers
>
> [1] http://search.cpan.org/~drtech/Search-Elasticsearch-1.14/lib/Search/Elasticsearch.pm#nodes
> [2] https://github.com/phobos182/collectd-elasticsearch
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>