[syslog-ng] Problems with failed connections and time_reopen()?

Balazs Scheidler bazsi77 at gmail.com
Wed May 8 06:29:31 CEST 2013


In both cases the client initiated the close operation not the load
balancer nor the server. Where does the connection stall, then?
On May 7, 2013 11:17 PM, "Matt Wise" <matt at nextdoor.com> wrote:

> Here's the dump THROUGH the ELB:
>
> 21:11:26.208951 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [S], seq
> 267618391, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7],
> length 0
>
> 21:11:26.290452 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [S.], seq
> 848900027, ack 267618392, win 14600, options [mss
> 1460,nop,nop,sackOK,nop,wscale 8], length 0
>
> 21:11:26.290509 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack 1,
> win 115, length 0
>
> 21:11:26.291460 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq
> 1:227, ack 1, win 115, length 226
>
> 21:11:26.375765 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack
> 227, win 62, length 0
>
> 21:11:26.401850 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], seq
> 1:1461, ack 227, win 62, length 1460
>
> 21:11:26.401871 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], seq
> 1461:2921, ack 227, win 62, length 1460
>
> 21:11:26.401898 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [P.], seq
> 2921:3515, ack 227, win 62, length 594
>
> 21:11:26.402343 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 1461, win 137, length 0
>
> 21:11:26.402356 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 2921, win 160, length 0
>
> 21:11:26.402361 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 3515, win 183, length 0
>
> 21:11:26.484345 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], seq
> 227:3147, ack 3515, win 183, length 2920
>
> 21:11:26.484365 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq
> 3147:3690, ack 3515, win 183, length 543
>
> 21:11:26.566175 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack
> 3147, win 85, length 0
>
> 21:11:26.569031 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], seq
> 3515:4975, ack 3690, win 96, length 1460
>
> 21:11:26.569046 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [P.], seq
> 4975:5221, ack 3690, win 96, length 246
>
> 21:11:26.569222 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 4975, win 206, length 0
>
> 21:11:26.569234 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 5221, win 229, length 0
>
> 21:11:28.478081 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq
> 3690:3727, ack 5221, win 229, length 37
>
> 21:11:28.603557 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack
> 3727, win 96, length 0
>
> 21:11:50.707433 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [P.], seq
> 5221:5258, ack 3727, win 96, length 37
>
> 21:11:50.707460 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 5258, win 229, length 0
>
> 21:11:50.707577 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [P.], seq
> 3727:3764, ack 5258, win 229, length 37
>
> 21:11:50.707599 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [F.], seq
> 3764, ack 5258, win 229, length 0
>
> 21:11:50.789084 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [.], ack
> 3764, win 96, length 0
>
> 21:11:50.789847 IP ELB.com.rfe > CLIENT.foo.com.43414: Flags [F.], seq
> 5258, ack 3765, win 96, length 0
>
> 21:11:50.789868 IP CLIENT.foo.com.43414 > ELB.com.rfe: Flags [.], ack
> 5259, win 229, length 0
>
>
> Here's a direct connection:
>
> 21:15:14.495542 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [S], seq
> 379756253, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7],
> length 0
>
> 21:15:14.576380 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [S.], seq
> 521570022, ack 379756254, win 14600, options [mss
> 1460,nop,nop,sackOK,nop,wscale 7], length 0
>
> 21:15:14.576409 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack 1,
> win 115, length 0
>
> 21:15:14.576940 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq
> 1:227, ack 1, win 115, length 226
>
> 21:15:14.657397 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack
> 227, win 123, length 0
>
> 21:15:14.683465 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], seq
> 1:1461, ack 227, win 123, length 1460
>
> 21:15:14.683481 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], seq
> 1461:2921, ack 227, win 123, length 1460
>
> 21:15:14.683485 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [P.], seq
> 2921:3515, ack 227, win 123, length 594
>
> 21:15:14.683683 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 1461, win 137, length 0
>
> 21:15:14.683696 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 2921, win 160, length 0
>
> 21:15:14.683702 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 3515, win 183, length 0
>
> 21:15:14.766227 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], seq
> 227:3147, ack 3515, win 183, length 2920
>
> 21:15:14.766243 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq
> 3147:3690, ack 3515, win 183, length 543
>
> 21:15:14.846942 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack
> 3147, win 169, length 0
>
> 21:15:14.849068 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], seq
> 3515:4975, ack 3690, win 191, length 1460
>
> 21:15:14.849082 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [P.], seq
> 4975:5221, ack 3690, win 191, length 246
>
> 21:15:14.849251 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 4975, win 206, length 0
>
> 21:15:14.849262 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 5221, win 229, length 0
>
> 21:15:18.394716 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq
> 3690:3727, ack 5221, win 229, length 37
>
> 21:15:18.511442 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack
> 3727, win 191, length 0
>
> 21:15:52.957532 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [P.], seq
> 5221:5258, ack 3727, win 191, length 37
>
> 21:15:52.957587 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 5258, win 229, length 0
>
> 21:15:52.957716 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [P.], seq
> 3727:3764, ack 5258, win 229, length 37
>
> 21:15:52.957742 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [F.], seq
> 3764, ack 5258, win 229, length 0
>
> 21:15:53.039203 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack
> 3764, win 191, length 0
>
> 21:15:53.039468 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [F.], seq
> 5258, ack 3764, win 191, length 0
>
> 21:15:53.039484 IP CLIENT.foo.com.18497 > ELB.com.rfe: Flags [.], ack
> 5259, win 229, length 0
>
> 21:15:53.039492 IP ELB.com.rfe > CLIENT.foo.com.18497: Flags [.], ack
> 3765, win 191, length 0
>
>
> Any thoughts? By the way, I'm trying out 3.3.9, but running into other
> issues..
>
> On May 7, 2013, at 1:55 PM, Balazs Scheidler <bazsi77 at gmail.com> wrote:
>
>
> On May 7, 2013 10:51 PM, "Matt Wise" <matt at nextdoor.com> wrote:
> >
> > I've done some more testing and now have narrowed the problem down to
> our Amazon ELB. Because the OSS version of Syslog-ng does not support
> failing over destinations from hostA to hostB when one fails, we are using
> an ELB in front of our syslog servers.
> >
> > When we have no ELB in place, our syslog-ng client detects the network
> drop immediately and begins to try to reconnect. When the ELB is in the
> way, it never detects the network connection drop. I don't understand why.
> I've tested a bit manually using openssl to connect to our remote endpoint
> through the ELB and directly and I don't see any difference in the way
> network connections are killed off. Any thoughts here?
> >
>
> Hmm interesting. The difference might be how connections are terminated.
> Can you check that using tcpdump?
>
> > --matt
> >
> > On May 6, 2013, at 9:53 AM, Matt Wise <matt at nextdoor.com> wrote:
> >
> > > We're running Syslog-NG 3.3.4 in our mixed Ubuntu 10/12 environment.
> We use SSL for all of our syslog-to-syslog connections, and have logging
> going to two different data pipelines.
> > >
> > >  Data Dest #1: SyslogNG Client ----(SSL)----> SyslogNG Server ------>
> Logstash File-read-in-service
> > >  Data Dest #2: SyslogNG Client ----(SSL)----> Stunnel Service ------>
> Flume Syslog Service
> > >
> > > The data streams work fine most of the time, but if we restart either
> the remote syslog-ng server, or the stunnel service, it seems that the
> syslog ng clients don't try to reconnect for a LONG time (or ever) to the
> endpoints again. I end up seeing the connection on the client go into a
> CLOSE_WAIT state, and syslog-ng keeps thinking that its sending log events
> through the connection, so it seems to never try to reconnect.
> > >
> > > I've tried setting time_reopen() to 0, 1 and 5... no luck or change in
> behavior.
> > >
> > > Any thoughts?
> > >
> > > --Matt
> > >
> >
> >
> ______________________________________________________________________________
> > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> > Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> > FAQ: http://www.balabit.com/wiki/syslog-ng-faq
> >
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20130508/ffefb8b1/attachment.htm 


More information about the syslog-ng mailing list