[syslog-ng] Problems with failed connections and time_reopen()?

Matt Wise matt at nextdoor.com
Wed May 8 17:52:08 CEST 2013


In both test cases, I initiated the failure by restarting the syslog endpoint (which is actually a flume agent). When running through the ELB, the syslog-ng client never catches the connection failure and continues to try to send data through a TCP connection thats in CLOSE_WAIT state. When not using the ELB, the syslog-ng client notices immediately that the connection has failed and begins to reconnect in earnest.

--Matt

On May 7, 2013, at 9:29 PM, Balazs Scheidler <bazsi77 at gmail.com> wrote:

> In both cases the client initiated the close operation not the load balancer nor the server. Where does the connection stall, then?
> 
> On May 7, 2013 11:17 PM, "Matt Wise" <matt at nextdoor.com> wrote:
> Here's the dump THROUGH the ELB:
> 
>> 21:11:26.208951 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [S], seq 267618391, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
>> 21:11:26.290452 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [S.], seq 848900027, ack 267618392, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 8], length 0
>> 21:11:26.290509 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 1, win 115, length 0
>> 21:11:26.291460 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq 1:227, ack 1, win 115, length 226
>> 21:11:26.375765 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack 227, win 62, length 0
>> 21:11:26.401850 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], seq 1:1461, ack 227, win 62, length 1460
>> 21:11:26.401871 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], seq 1461:2921, ack 227, win 62, length 1460
>> 21:11:26.401898 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [P.], seq 2921:3515, ack 227, win 62, length 594
>> 21:11:26.402343 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 1461, win 137, length 0
>> 21:11:26.402356 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 2921, win 160, length 0
>> 21:11:26.402361 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 3515, win 183, length 0
>> 21:11:26.484345 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], seq 227:3147, ack 3515, win 183, length 2920
>> 21:11:26.484365 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq 3147:3690, ack 3515, win 183, length 543
>> 21:11:26.566175 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack 3147, win 85, length 0 
>> 21:11:26.569031 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], seq 3515:4975, ack 3690, win 96, length 1460
>> 21:11:26.569046 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [P.], seq 4975:5221, ack 3690, win 96, length 246
>> 21:11:26.569222 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 4975, win 206, length 0
>> 21:11:26.569234 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 5221, win 229, length 0
>> 21:11:28.478081 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq 3690:3727, ack 5221, win 229, length 37
>> 21:11:28.603557 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack 3727, win 96, length 0 
>> 21:11:50.707433 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [P.], seq 5221:5258, ack 3727, win 96, length 37
>> 21:11:50.707460 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 5258, win 229, length 0
>> 21:11:50.707577 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq 3727:3764, ack 5258, win 229, length 37
>> 21:11:50.707599 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [F.], seq 3764, ack 5258, win 229, length 0
>> 21:11:50.789084 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack 3764, win 96, length 0 
>> 21:11:50.789847 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [F.], seq 5258, ack 3765, win 96, length 0
>> 21:11:50.789868 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 5259, win 229, length 0
> 
> Here's a direct connection:
> 
>> 21:15:14.495542 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [S], seq 379756253, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
>> 21:15:14.576380 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [S.], seq 521570022, ack 379756254, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
>> 21:15:14.576409 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 1, win 115, length 0
>> 21:15:14.576940 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq 1:227, ack 1, win 115, length 226
>> 21:15:14.657397 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack 227, win 123, length 0
>> 21:15:14.683465 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], seq 1:1461, ack 227, win 123, length 1460
>> 21:15:14.683481 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], seq 1461:2921, ack 227, win 123, length 1460
>> 21:15:14.683485 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [P.], seq 2921:3515, ack 227, win 123, length 594
>> 21:15:14.683683 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 1461, win 137, length 0
>> 21:15:14.683696 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 2921, win 160, length 0
>> 21:15:14.683702 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 3515, win 183, length 0
>> 21:15:14.766227 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], seq 227:3147, ack 3515, win 183, length 2920
>> 21:15:14.766243 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq 3147:3690, ack 3515, win 183, length 543
>> 21:15:14.846942 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack 3147, win 169, length 0
>> 21:15:14.849068 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], seq 3515:4975, ack 3690, win 191, length 1460
>> 21:15:14.849082 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [P.], seq 4975:5221, ack 3690, win 191, length 246
>> 21:15:14.849251 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 4975, win 206, length 0
>> 21:15:14.849262 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 5221, win 229, length 0
>> 21:15:18.394716 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq 3690:3727, ack 5221, win 229, length 37
>> 21:15:18.511442 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack 3727, win 191, length 0
>> 21:15:52.957532 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [P.], seq 5221:5258, ack 3727, win 191, length 37
>> 21:15:52.957587 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 5258, win 229, length 0
>> 21:15:52.957716 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq 3727:3764, ack 5258, win 229, length 37
>> 21:15:52.957742 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [F.], seq 3764, ack 5258, win 229, length 0
>> 21:15:53.039203 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack 3764, win 191, length 0
>> 21:15:53.039468 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [F.], seq 5258, ack 3764, win 191, length 0
>> 21:15:53.039484 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 5259, win 229, length 0
>> 21:15:53.039492 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack 3765, win 191, length 0
> 
> 
> Any thoughts? By the way, I'm trying out 3.3.9, but running into other issues..
> 
> On May 7, 2013, at 1:55 PM, Balazs Scheidler <bazsi77 at gmail.com> wrote:
> 
>> 
>> On May 7, 2013 10:51 PM, "Matt Wise" <matt at nextdoor.com> wrote:
>> >
>> > I've done some more testing and now have narrowed the problem down to our Amazon ELB. Because the OSS version of Syslog-ng does not support failing over destinations from hostA to hostB when one fails, we are using an ELB in front of our syslog servers.
>> >
>> > When we have no ELB in place, our syslog-ng client detects the network drop immediately and begins to try to reconnect. When the ELB is in the way, it never detects the network connection drop. I don't understand why. I've tested a bit manually using openssl to connect to our remote endpoint through the ELB and directly and I don't see any difference in the way network connections are killed off. Any thoughts here?
>> >
>> 
>> Hmm interesting. The difference might be how connections are terminated. Can you check that using tcpdump?
>> 
>> > --matt
>> >
>> > On May 6, 2013, at 9:53 AM, Matt Wise <matt at nextdoor.com> wrote:
>> >
>> > > We're running Syslog-NG 3.3.4 in our mixed Ubuntu 10/12 environment. We use SSL for all of our syslog-to-syslog connections, and have logging going to two different data pipelines.
>> > >
>> > >  Data Dest #1: SyslogNG Client ----(SSL)----> SyslogNG Server ------> Logstash File-read-in-service
>> > >  Data Dest #2: SyslogNG Client ----(SSL)----> Stunnel Service ------> Flume Syslog Service
>> > >
>> > > The data streams work fine most of the time, but if we restart either the remote syslog-ng server, or the stunnel service, it seems that the syslog ng clients don't try to reconnect for a LONG time (or ever) to the endpoints again. I end up seeing the connection on the client go into a CLOSE_WAIT state, and syslog-ng keeps thinking that its sending log events through the connection, so it seems to never try to reconnect.
>> > >
>> > > I've tried setting time_reopen() to 0, 1 and 5... no luck or change in behavior.
>> > >
>> > > Any thoughts?
>> > >
>> > > --Matt
>> > >
>> >
>> > ______________________________________________________________________________
>> > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> > FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>> >
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>> 
> 
> 
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> 
> 
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20130508/2121a5ff/attachment-0001.htm 


More information about the syslog-ng mailing list