[syslog-ng] Elasticsearch destination

Wed Oct 29 14:29:38 CET 2014

Hi Jim,

On Tue, Oct 28, 2014 at 04:36:19PM -0400, Jim Hendrick wrote:
> Now the issue is performance. I am sending roughly ~5000 EPS to the
> syslog-ng instance running patterndb, but only able to "sustain" less than
> 1000 to elasticsearch (oddly, ES seems to start receiving at ~5000 EPS, and
> within an hour or less, drops to ~1000)

I've got a similar workload, and seeing drops too.
When EPS is below 2k/s, usually syslog-ng copes. When it goes above, I can
see drops. Enabling flow-control seems to help from the syslog-ng
perspective (no drops in `syslog-ng-ctl stats`) but when I look at protocol
counters in the Linux kernel, the drops can be seen as "InErrors" (I'm using
UDP). I'm a little lost when trying to interpret the effect of syslog-ng
tuneables.

> I have tried a number of things, including running a second ES node and
> letting syslog-ng "round robin" with no luck at all.

We're doing that by specifying the `nodes` key in Elasticsearch.pm:
according to its documentation [1] this should ensure Search::Elasticsearch 
makes use of load-balancing. This seems to work as intended, when checking
the bandwidth between syslog-ng and all ES nodeS.

When looking at the statistics of my nodes, they seem to be hitting no
bottleneck whatsoever:

* load is between 0 and 2 (8 cores total)
* writes average around 50/s with peaks around 150  (6+P RAID 10k SAS)
* reads are ridiculous
* heap usage is around 75% (of 24g)
* interface rx ~500k/s
* elasticsearch index rate ~500/s

> ES tuning has included locking 16G of memory per ES instance, and setting
> indices.memory.index_buffer_size: 50%

We're using 'index_buffer_size: 30%' and 'ES_HEAP_SIZE=24g' on our 6 ES
nodes. max_size is 256 in syslog-ng/Elasticsearch.pm

What we're currently doing (or planning) to try to investigate:

1. micro-benchmark the CPAN module to see if we can go above 2k/s
2. improve the statistics gathered by collectd-elasticsearch [2]
3. write a dummy ES server which only does some
   accounting but throws data away, in order to do some benchmarking.
4. compare python, lua and perl implementations
5. tune various syslog-ng parameters
6. use some MQ implementation between ES and syslog-ng
7. use TCP instead of UDP for incoming syslog

I realize this won't help you much, but may be of interest so we can channel
our common research. I'll be meeting with some syslog-ng experts very soon,
and I am convinced I'll come back with many options to improve the
situation.

Cheers

[1] http://search.cpan.org/~drtech/Search-Elasticsearch-1.14/lib/Search/Elasticsearch.pm#nodes
[2] https://github.com/phobos182/collectd-elasticsearch