[syslog-ng] Program destination seems to block

Evan Rempel erempel at uvic.ca
Sat May 15 00:06:06 CEST 2010


OK, I finally got back to this.

The lsof of the forground syslog-ng process shows

localhost% sudo /usr/sbin/lsof -p 18279
COMMAND     PID USER   FD   TYPE             DEVICE    SIZE      NODE NAME
syslog-ng 18279 root  cwd    DIR              253,6    4096     32793 /usr/local/etc/syslog-ng
syslog-ng 18279 root  rtd    DIR              253,0    1024         2 /
syslog-ng 18279 root  txt    REG              253,6  321731    233449 /usr/local/sbin/syslog-ng
syslog-ng 18279 root  mem    REG              253,0   95464     49455 /lib64/libselinux.so.1
syslog-ng 18279 root  mem    REG              253,0  247496     49454 /lib64/libsepol.so.1
syslog-ng 18279 root  mem    REG              253,5   85928    394112 /usr/lib64/libz.so.1.2.3
syslog-ng 18279 root  mem    REG              253,0  647608     49451 /lib64/libglib-2.0.so.0.1200.3
syslog-ng 18279 root  mem    REG              253,0 1366176     49458 /lib64/libcrypto.so.0.9.8e
syslog-ng 18279 root  mem    REG              253,5  190976    397026 /usr/lib64/libgssapi_krb5.so.2.2
syslog-ng 18279 root  mem    REG              253,0   10000     49465 /lib64/libcom_err.so.2.1
syslog-ng 18279 root  mem    REG              253,5  153464    397024 /usr/lib64/libk5crypto.so.3.1
syslog-ng 18279 root  mem    REG              253,5  613896    397025 /usr/lib64/libkrb5.so.3.3
syslog-ng 18279 root  mem    REG              253,5   35728    397023 /usr/lib64/libkrb5support.so.0.1
syslog-ng 18279 root  mem    REG              253,0    9728     49474 /lib64/libkeyutils-1.2.so
syslog-ng 18279 root  mem    REG              253,0  137256     53855 /lib64/ld-2.5.so
syslog-ng 18279 root  mem    REG              253,5   25464    393467 /usr/lib64/gconv/gconv-modules.cache
syslog-ng 18279 root  mem    REG              253,4  493726    327686 /var/db/nscd/hosts
syslog-ng 18279 root  mem    REG              253,0   50288     49190 /lib64/librt-2.5.so
syslog-ng 18279 root  mem    REG              253,0  111480     49172 /lib64/libnsl-2.5.so
syslog-ng 18279 root  mem    REG              253,6   21478    136301 /usr/local/lib/libevtlog.so.0.0.0
syslog-ng 18279 root  mem    REG              253,0  306632     49331 /lib64/libssl.so.0.9.8e
syslog-ng 18279 root  mem    REG              253,0   20424     49168 /lib64/libdl-2.5.so
syslog-ng 18279 root  mem    REG              253,0 1712216     49160 /lib64/libc-2.5.so
syslog-ng 18279 root  mem    REG              253,0  142696     49186 /lib64/libpthread-2.5.so
syslog-ng 18279 root  mem    REG              253,0   89800     49188 /lib64/libresolv-2.5.so
syslog-ng 18279 root    0u   CHR              136,1                 3 /dev/pts/1
syslog-ng 18279 root    1u   CHR              136,1                 3 /dev/pts/1
syslog-ng 18279 root    2u   CHR              136,1                 3 /dev/pts/1
syslog-ng 18279 root    3u  FIFO              253,4            491523 /var/log/syslog.pipes/erempel
syslog-ng 18279 root    4u  IPv4          121995943               TCP localhost:34513->localhost.localdomain:1514 
(ESTABLISHED)
syslog-ng 18279 root    5w  FIFO                0,6         121995942 pipe
syslog-ng 18279 root    6u  unix 0xffff81043fd3f100         121995946 /var/syslog-ng.ctl

And the strace and configuration file are attached.

I'm not an strace expert, but it looks like the source is polled, but once the output can not be written to, the
source is not read, even though the poll shows data is available.

Thanks for looking at this.

Balazs Scheidler wrote:
> On Wed, 2010-04-14 at 14:04 -0700, Evan Rempel wrote:
>> Balazs Scheidler wrote:
>>> On Wed, 2010-04-14 at 13:18 -0700, Evan Rempel wrote:
>>>> A little background.
>>>>
>>>> There is a "server" syslog-ng process that accepts messages from the network and
>>>> sends the messages to a variety of destinations. For this report, I am only interested
>>>> in one destination that happens to be a pipe.
>>>>
>>>> There is a "slave" syslog-ng process that reads from the pipe that the "server" writes to,
>>>> and writes to a program destination.
>>>>
>>>> The program reads the standard in, and does "something".
>>>> All works well.
>>>>
>>>> At some point our application (we know why and don't want to discuss it) application stops
>>>> reading standard in for a while (1,000,000 lines over an hour). We expect that the memory
>>>> footprint of syslog-ng "slave" to grow during this time but it does not. Instead, the
>>>> memory footprint of the syslog-ng "server" grows. When our application starts reading
>>>> its standard in again, the memory footprint of the syslog "slave" grows very quickly,
>>>> and all messages reach the destination.
>>>>
>>>> I think that the syslog-ng "slave" get blocked on the program destination in a way that
>>>> prevents it from reading its source, resulting in the upstream syslog-ng "server" having
>>>> to buffer all of the messages.
>>>>
>>>> There is no flow control anywhere, and both syslog-ng instances have log_fifo_size(8000000)
>>>> for all of the destinations.
>>>>
>>>> Anyone have any suggestions?
>>> Hmm, either the slave syslog-ng really blocks (but I don't know any
>>> similar bugs right now), or flow control is enabled.
>>>
>>> There was a bug that caused flow-control to be enabled, if any of the
>>> flags was used. Do you have fallback enabled?
>> No, the only flags in the "slave" is no-parse.
>>
>>
>>> Can you post the exact versions of syslog-ng you are using?
>> 3.0.5 OSE
>>
>>
>>> Also, you could confirm if the slave instance really blocks, or it has
>>> just stalled one of its sources. You could do that by attaching to the
>>> slave process using strace for a little while.
>>>
>>> 1) first check what fd is being used between master/slave (lsof -p
>>> <pid>)
>>> 2) then check via strace if that fd is being polled for POLLIN or not
>>>
>>> If it is not polled, then flow-control is somehow enabled, if syslog-ng
>>> is not polling but waiting somewhere, then it might be blocked as you
>>> suggest.
>>>
>>> Anyway, the list of open file descriptors and the strace dump could help
>>> in tracking down both cases.
>>
>> Not to be lazy here, but I am not going to get to this for weeks, really busy
>> at my site right now.
>>
>> Here is a program that can turn on/off the reading of standard in by using
>> kill -USR1 <PID>
>> kill -USR2 <PID>
>>
>>
>> ---------------
>> #!/usr/bin/perl
>>
>> my $read = 0;
>>
>> $SIG{USR1} = sub { $read = 1 };
>> $SIG{USR2} = sub { $read = 0 };
>>
>> while ( not eof(STDIN)) {
>>    if ( $read == 1 ) {
>>      $line = <STDIN>;
>>      print $line;
>>    } else {
>>      sleep 1;
>>    }
>> }
>> ---------------
>>
>> So hopefully you can reproduce this this really easy at your end.
>>
>> If I don't hear back, in a few weeks I'll get to this.
> 
> I've tried it with 3.1.1, 3.0.6 and finally 3.0.5 and all behaved
> properly. Memory size grew until it couldn't write to the test program,
> of course only until the FIFO size filled up.
> 
> Once I let the program go with a SIGUSR1 it started sending messages,
> the drop count stayed the same. Flow-control was not enabled.
> 
> 


-- 
Evan Rempel                               erempel at uvic.ca
Senior Systems Administrator                 250.721.7691
Unix Services, University Systems, University of Victoria
-------------- next part --------------
A non-text attachment was scrubbed...
Name: syslog.trc.gz
Type: application/x-gzip
Size: 656230 bytes
Desc: not available
Url : http://lists.balabit.hu/pipermail/syslog-ng/attachments/20100514/39fb2861/attachment-0001.bin 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: syslog-ng.erempel.conf
Url: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20100514/39fb2861/attachment-0001.txt 


More information about the syslog-ng mailing list