Thanks! I will look into setting that up (hopefully today, but it may be the first of next week). Yesterday I was able to get ~4k/sec with format-json and a redis destination, using logstash between redis and elasticsearch. In that case, logstash was pretty clearly the bottleneck, since I was pushing consistently ~4000-4500 through syslog-ng, but only ~3800-4000 were making it to elasticsearch. I saw this most clearly when I shut down syslog-ng and it took the rest of the system several minutes to process what was cached in redis. I am using ubuntu, but within a corporate net (lab systems but still getting modules, etc. is not always trivial). Let me see if I can setup the profiling. (and as far as experience - I am *very* new to the ELK pieces - learning as I go. It is still quite possible I can do some major tuning in that area. That is one of the reasons I am trying to have syslog-ng do as much as possible so I can remove "L" and only use "E" and "K" :-) Thanks again all! Jim On 10/30/2014 11:55 PM, Balazs Scheidler wrote:
Hi,
If the 3rd option is the slowest, then this seems to be related to the syslog-ng perl module or Elasticsearch.pm.
I've just checked, the syslog-ng perl module does a value-pairs evaluation and sends the results to the perl function as a hash. This is not the speediest thing (it'd be better to export the underlying C structure as an object to Perl, but should still cope with much more than 2k/sec).
I'm wondering what I could do to help. I'm following this thread, but as I lack the ES experience I don't have the same environment that you do.
If you could use some kind of profiling (like perf for instance) and had the associated debug symbols in at least syslog-ng (and preferably also in perl), we should easily pinpoint the issue. Setting up perf and symbols is easy if your distro supports it, but is a big hassle if it doesn't.
My experience with perf is on Ubuntu, but I heard it's better in Fedora. Which distro are you using?
This is the outline what you'd have to do in order to perform profiling: - don't strip syslog-ng (and neither syslog-ng-incubator) after compilation and use -g in CFLAGS, syslog-ng doesn't do this in its build script, but .rpm/.deb packaging usually does - you can verify this by running file <path-to-binary>
- install symbols for syslog-ng dependencies (these are the dbgsyms packages in ubuntu, https://wiki.ubuntu.com/DebuggingProgramCrash#Debug_Symbol_Packages) - run perf record -g "syslog-ng command line" - reproduce the load - run perf report
You'll see what parts uses the most CPU in the result. Or you can send it here for analysis.
HTH Bazsi
On Wed, Oct 29, 2014 at 2:50 PM, Jim Hendrick <jrhendri@roadrunner.com <mailto:jrhendri@roadrunner.com>> wrote:
Thank you sir!
At least this is not unique to my testing (not sure that's actually *good* news :-)
I will try and reproduce some comparable baselines using a couple setups I have tried:
1) proxy-syslog --> syslog-ng --> redis --> logstash+grok --> logstash --> elasticsearch This was essentially following a basic set of instructions just to make sure I could reproduce them.
2) proxy-syslog --> syslog-ng+patterndb+format-json --> redis --> logstash --> elasticsearch This moved the pattern matching and conversion to json out to the edge, leaving redis & logstash since they worked well at feeding elasticsearch.
3) proxy-syslog --> syslog-ng+patterndb+Elasticsearch.pm --> elasticsearch This seemed the simplest & most promising.
I have not tried all three with the same load, so I cannot definitively say one is better, but my subjective feel is that #3 was actually the slowest. I suspect something with the way the data is being sent to elasticsearch but I do not know whether it is an issue with the perl module itself or somehow in the way the data is being sent to elasticsearch (indexing, etc.)
My overall thought is (still) that parsing at each syslog-ng server with no middleman should be fastest, since as you scale to more syslog-ng servers you are distributing the pattern matching load.
I am still not sure if a broker (redis, rabbitmq, etc.) will help as long as elasticsearch can accept the data fast enough.
Thanks for the feedback - I will certainly post whatever I come up with in the next day or so.
Jim
On 10/29/2014 09:29 AM, Fabien Wernli wrote: > Hi Jim, > > On Tue, Oct 28, 2014 at 04:36:19PM -0400, Jim Hendrick wrote: >> Now the issue is performance. I am sending roughly ~5000 EPS to the >> syslog-ng instance running patterndb, but only able to "sustain" less than >> 1000 to elasticsearch (oddly, ES seems to start receiving at ~5000 EPS, and >> within an hour or less, drops to ~1000) > I've got a similar workload, and seeing drops too. > When EPS is below 2k/s, usually syslog-ng copes. When it goes above, I can > see drops. Enabling flow-control seems to help from the syslog-ng > perspective (no drops in `syslog-ng-ctl stats`) but when I look at protocol > counters in the Linux kernel, the drops can be seen as "InErrors" (I'm using > UDP). I'm a little lost when trying to interpret the effect of syslog-ng > tuneables. > >> I have tried a number of things, including running a second ES node and >> letting syslog-ng "round robin" with no luck at all. > We're doing that by specifying the `nodes` key in Elasticsearch.pm: > according to its documentation [1] this should ensure Search::Elasticsearch > makes use of load-balancing. This seems to work as intended, when checking > the bandwidth between syslog-ng and all ES nodeS. > > When looking at the statistics of my nodes, they seem to be hitting no > bottleneck whatsoever: > > * load is between 0 and 2 (8 cores total) > * writes average around 50/s with peaks around 150 (6+P RAID 10k SAS) > * reads are ridiculous > * heap usage is around 75% (of 24g) > * interface rx ~500k/s > * elasticsearch index rate ~500/s > >> ES tuning has included locking 16G of memory per ES instance, and setting >> indices.memory.index_buffer_size: 50% > We're using 'index_buffer_size: 30%' and 'ES_HEAP_SIZE=24g' on our 6 ES > nodes. max_size is 256 in syslog-ng/Elasticsearch.pm > > What we're currently doing (or planning) to try to investigate: > > 1. micro-benchmark the CPAN module to see if we can go above 2k/s > 2. improve the statistics gathered by collectd-elasticsearch [2] > 3. write a dummy ES server which only does some > accounting but throws data away, in order to do some benchmarking. > 4. compare python, lua and perl implementations > 5. tune various syslog-ng parameters > 6. use some MQ implementation between ES and syslog-ng > 7. use TCP instead of UDP for incoming syslog > > I realize this won't help you much, but may be of interest so we can channel > our common research. I'll be meeting with some syslog-ng experts very soon, > and I am convinced I'll come back with many options to improve the > situation. > > Cheers > > [1] http://search.cpan.org/~drtech/Search-Elasticsearch-1.14/lib/Search/Elastics... <http://search.cpan.org/%7Edrtech/Search-Elasticsearch-1.14/lib/Search/Elasticsearch.pm#nodes> > [2] https://github.com/phobos182/collectd-elasticsearch > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.balabit.com/wiki/syslog-ng-faq > >
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq