[syslog-ng] Problems with failed connections and time_reopen()?

Matt Wise matt at nextdoor.com
Tue May 7 22:51:11 CEST 2013


I've done some more testing and now have narrowed the problem down to our Amazon ELB. Because the OSS version of Syslog-ng does not support failing over destinations from hostA to hostB when one fails, we are using an ELB in front of our syslog servers.

When we have no ELB in place, our syslog-ng client detects the network drop immediately and begins to try to reconnect. When the ELB is in the way, it never detects the network connection drop. I don't understand why. I've tested a bit manually using openssl to connect to our remote endpoint through the ELB and directly and I don't see any difference in the way network connections are killed off. Any thoughts here?

--matt

On May 6, 2013, at 9:53 AM, Matt Wise <matt at nextdoor.com> wrote:

> We're running Syslog-NG 3.3.4 in our mixed Ubuntu 10/12 environment. We use SSL for all of our syslog-to-syslog connections, and have logging going to two different data pipelines.
> 
>  Data Dest #1: SyslogNG Client ----(SSL)----> SyslogNG Server ------> Logstash File-read-in-service
>  Data Dest #2: SyslogNG Client ----(SSL)----> Stunnel Service ------> Flume Syslog Service
> 
> The data streams work fine most of the time, but if we restart either the remote syslog-ng server, or the stunnel service, it seems that the syslog ng clients don't try to reconnect for a LONG time (or ever) to the endpoints again. I end up seeing the connection on the client go into a CLOSE_WAIT state, and syslog-ng keeps thinking that its sending log events through the connection, so it seems to never try to reconnect.
> 
> I've tried setting time_reopen() to 0, 1 and 5... no luck or change in behavior.
> 
> Any thoughts?
> 
> --Matt
> 



More information about the syslog-ng mailing list