[syslog-ng] Program destination seems to block

Evan Rempel erempel at uvic.ca
Thu Apr 15 21:48:44 CEST 2010


Thanks for that quick test.
I will have a better look at my end. (like testing with this example program).

Balazs Scheidler wrote:
> On Wed, 2010-04-14 at 14:04 -0700, Evan Rempel wrote:
>> Balazs Scheidler wrote:
>>> On Wed, 2010-04-14 at 13:18 -0700, Evan Rempel wrote:
>>>> A little background.
>>>>
>>>> There is a "server" syslog-ng process that accepts messages from the network and
>>>> sends the messages to a variety of destinations. For this report, I am only interested
>>>> in one destination that happens to be a pipe.
>>>>
>>>> There is a "slave" syslog-ng process that reads from the pipe that the "server" writes to,
>>>> and writes to a program destination.
>>>>
>>>> The program reads the standard in, and does "something".
>>>> All works well.
>>>>
>>>> At some point our application (we know why and don't want to discuss it) application stops
>>>> reading standard in for a while (1,000,000 lines over an hour). We expect that the memory
>>>> footprint of syslog-ng "slave" to grow during this time but it does not. Instead, the
>>>> memory footprint of the syslog-ng "server" grows. When our application starts reading
>>>> its standard in again, the memory footprint of the syslog "slave" grows very quickly,
>>>> and all messages reach the destination.
>>>>
>>>> I think that the syslog-ng "slave" get blocked on the program destination in a way that
>>>> prevents it from reading its source, resulting in the upstream syslog-ng "server" having
>>>> to buffer all of the messages.
>>>>
>>>> There is no flow control anywhere, and both syslog-ng instances have log_fifo_size(8000000)
>>>> for all of the destinations.
>>>>
>>>> Anyone have any suggestions?
>>> Hmm, either the slave syslog-ng really blocks (but I don't know any
>>> similar bugs right now), or flow control is enabled.
>>>
>>> There was a bug that caused flow-control to be enabled, if any of the
>>> flags was used. Do you have fallback enabled?
>> No, the only flags in the "slave" is no-parse.
>>
>>
>>> Can you post the exact versions of syslog-ng you are using?
>> 3.0.5 OSE
>>
>>
>>> Also, you could confirm if the slave instance really blocks, or it has
>>> just stalled one of its sources. You could do that by attaching to the
>>> slave process using strace for a little while.
>>>
>>> 1) first check what fd is being used between master/slave (lsof -p
>>> <pid>)
>>> 2) then check via strace if that fd is being polled for POLLIN or not
>>>
>>> If it is not polled, then flow-control is somehow enabled, if syslog-ng
>>> is not polling but waiting somewhere, then it might be blocked as you
>>> suggest.
>>>
>>> Anyway, the list of open file descriptors and the strace dump could help
>>> in tracking down both cases.
>>
>> Not to be lazy here, but I am not going to get to this for weeks, really busy
>> at my site right now.
>>
>> Here is a program that can turn on/off the reading of standard in by using
>> kill -USR1 <PID>
>> kill -USR2 <PID>
>>
>>
>> ---------------
>> #!/usr/bin/perl
>>
>> my $read = 0;
>>
>> $SIG{USR1} = sub { $read = 1 };
>> $SIG{USR2} = sub { $read = 0 };
>>
>> while ( not eof(STDIN)) {
>>    if ( $read == 1 ) {
>>      $line = <STDIN>;
>>      print $line;
>>    } else {
>>      sleep 1;
>>    }
>> }
>> ---------------
>>
>> So hopefully you can reproduce this this really easy at your end.
>>
>> If I don't hear back, in a few weeks I'll get to this.
> 
> I've tried it with 3.1.1, 3.0.6 and finally 3.0.5 and all behaved
> properly. Memory size grew until it couldn't write to the test program,
> of course only until the FIFO size filled up.
> 
> Once I let the program go with a SIGUSR1 it started sending messages,
> the drop count stayed the same. Flow-control was not enabled.
> 
> 


-- 
Evan Rempel                               erempel at uvic.ca
Senior Systems Administrator                 250.721.7691
Unix Services, University Systems, University of Victoria


More information about the syslog-ng mailing list