[syslog-ng] Syslog-ng 3.0.2 statistics

Aaron Robel megawott at gmail.com
Wed Jun 17 20:18:20 CEST 2009


More tests...
I ran tcpdump on both relay-01 and the archive box.  There were zero
descrepancies between the tcpdumps.  This tells me that the "virtual
network" is good.

Here is the latest in message descrepancy:
relay-01 - 3745 mps (this is according to the processed destination, my
archive box)
archive - 1900 mps (this is according to the processed source)

Another question about syslog processing, does syslog-ng record processed
stats for the source based on what it wrote to the file destinations? Or, is
it simply on how many messages it receives on the source? If it's simply how
many messages it's received then all my filters and destinations can be
ruled out.  I was concerned that having 150 filters and 150 destinations
within the syslog_config might hit a limitation.  What I've done is
separated out every network device to a seperate file to make searches and
our web front end(phplogcon) perform better.


On Wed, Jun 17, 2009 at 10:24 AM, Aaron Robel <megawott at gmail.com> wrote:

> So, I did a couple tests.
>
> I started by watching realtime logs flow in on both the relay and archive.
> This showed that sure enough we not getting all our messages to the back
> end.
>
> I then removed the following options:
>  time_sleep(10);
> log_fetch_limit(250);
> log_fifo_size(2000);
> flush_lines(2000);
> flush_timeout(200);
>
> Then performed the test again.  The results were much  better, but we are
> still missing about 1 out of every 6 or 8 messages.  CPU, as expected, has
> also dramatically increased from 10% to 60% utilization.
>
> I thought my next step would be to compare tcpdumps on both boxes to rule
> out the network, then to progress onto more dramatic options.  Any other
> ideas on what may be happening is greatly appreciated.
>
> Just when I thought this project was about to be wrapped up, it drags me
> back in...
>   On Wed, Jun 17, 2009 at 10:05 AM, Martin Holste <mcholste at gmail.com>wrote:
>
>> I highly doubt that the UDP is being dropped on the "network" (quoted
>> since it's all in a VM), but you can always check by running iptraf on the
>> receiving interfaces to get a ballpark figure of how many UDP packets are
>> coming in on 514.  To find out if Syslog-NG is the bottleneck, try a test
>> config that is as simple as possible, e.g. configure with just one source
>> and one file destination and see what the stats do then.  If possible, you
>> could also try sending all of the logs to a stock syslogd daemon (see a
>> previous thread about this) which is faster for simple file writing
>> operations.  The truth may be that a VM is not a good environment for
>> high-performance log collection, and that turning all those VM's into one
>> physical might outperform your VM cluster.  Please keep me posted--I'm
>> interested in how this plays out.
>>
>> --Martin
>>
>>
>> On Wed, Jun 17, 2009 at 11:26 AM, Aaron Robel <megawott at gmail.com> wrote:
>>
>>> You make a good point. I initially thought the same thing and did some
>>> checking on the bandwidth usage and we aren't saturating any of the links or
>>> even getting close.  I also didn't see any errors or drops on the
>>> interfaces.  The big question for me is how does this all play out in the
>>> virtualized environment could I be running into a limitation there,
>>> rhetorical question.  All of these hosts live physically on the same piece
>>> of hardware and on the same vlan.   I'll keep poking around in that arena to
>>> see if anything turns up. Maybe play with tcp to the archive host, I just
>>> worry about performance implications.
>>>
>>> Do you see anything else in my options config that looks amiss?
>>>
>>> Thanks for the suggestion Joe.
>>> Hardware stats:
>>> relays:
>>> 2 3gig procs
>>> 4 gig mem
>>> 1 TB disk
>>>
>>> archive
>>> 4 3 gig procs
>>> 6 gig mem
>>> 5.5 TB disk
>>>
>>> Network bandwidth stats:
>>> relay 01:  in-850KBps out-300KBps (I'm assuming the descrepancy here is
>>> due to the fifo and flush settings.)
>>> relay 02:  in-60KBps out-55KBps
>>> relay 03:  in-nill out-nill
>>>
>>> Archive:
>>> network utilization: 600KBps
>>>   On Wed, Jun 17, 2009 at 8:58 AM, Fegan, Joe <Joe.Fegan at hp.com> wrote:
>>>
>>>>   Knee jerk reaction: are you using udp? You probably know that udp is
>>>> a connection-less, fire-and-forget protocol so if the packet gets lost
>>>> neither the sender nor the intended recipent will know (or care).
>>>>
>>>>  ------------------------------
>>>> *From:* syslog-ng-bounces at lists.balabit.hu [mailto:
>>>> syslog-ng-bounces at lists.balabit.hu] *On Behalf Of *Aaron Robel
>>>> *Sent:* 17 June 2009 16:20
>>>> *To:* syslog-ng at lists.balabit.hu
>>>> *Subject:* [syslog-ng] Syslog-ng 3.0.2 statistics
>>>>
>>>>   Hello,
>>>>
>>>> My apologies in advance, this is my first posting and I'm quite the
>>>> rook' when it comes to Linux and Syslog-ng. I keep wondering why this is my
>>>> project.
>>>>
>>>> I have a 4 server syslog deployment with 3 front end "relay" boxes and 1
>>>> backend archive box all within a virtualized SLES environment.
>>>>
>>>> Recently I noticed that the relay's together are averaging about 2500
>>>> messages per second (mps).   The majority of the messages are coming from a
>>>> single relay, about 2000 mps. Yet the archive box is only averaging about
>>>> 400 mps.
>>>>
>>>> Since we are running 3.0.2 I decided to turn up the stats_level to (1).
>>>> I don't see any drops to the about 150 file destinations that I've built.
>>>>
>>>> What does stamp, processed, stored, etc.. mean?  I couldn't find any
>>>> detailed documentation about the different statistics.
>>>>
>>>> Why am I getting such a large discrepency between "stamp" and
>>>> "processed" in the log stats?
>>>>
>>>> Finally, since I'm sending the email does anyone see an issue with the
>>>> way I've got the flow control set up in the global options?
>>>>
>>>> Here are my stats in question off my archive box:
>>>> processed='src.udp(s_network#0)=22020892',
>>>> stamp='src.udp(s_network#0)=1245249328'
>>>>
>>>> Here's the global's off the archive box:
>>>> options {
>>>>         time_sleep(10);
>>>>         log_fetch_limit(250);
>>>>         log_fifo_size(2000);
>>>>         use_dns(no);
>>>>         keep_timestamp(yes);
>>>>         dns_cache(no);
>>>>         long_hostnames(off);
>>>>         flush_lines(2000);
>>>>         flush_timeout(200);
>>>>         perm(0644);
>>>>         stats_freq(1800);
>>>>         stats_level(1);
>>>>         time_reopen(10);
>>>>         create_dirs(yes);
>>>>         dir_perm(755);
>>>> };
>>>>  Thanks!
>>>>
>>>>
>>>>
>>>> ______________________________________________________________________________
>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>> Documentation:
>>>> http://www.balabit.com/support/documentation/?product=syslog-ng
>>>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Aaron Robel
>>>
>>>
>>> ______________________________________________________________________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation:
>>> http://www.balabit.com/support/documentation/?product=syslog-ng
>>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>>
>>>
>>>
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation:
>> http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>>
>>
>
>
> --
> Aaron Robel
>



-- 
Aaron Robel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20090617/035d1d9d/attachment-0001.htm 


More information about the syslog-ng mailing list