Reliable tcp logging

newer
Subsys Dead, syslog-ng and SELinux

Peter Daum

11 May 2005 11 May '05

12:32 p.m.

I am trying to send all important messages from a bunch of other machines to a central syslog-ng server via tcp. I chose tcp partly, because the same log server gets all kinds of less important stuff via udp from other machines, which can easily be distinguished that way, but partially also because I expected tcp to be more reliable. Unfortunately, this does not seem to be the case: When the connection has died for any reason, the client will only discover this when it is trying to send the next message to the server. Only then it starts to wait until "time_reopen" is over and establishes a new connection - the message that originally triggered this and whatever comes in between is lost. Is there any way to get syslog-ng (v 1.6.5) to check more often whether a tcp connection to a log host still exists and re-establish it otherwise? I did not see any reference to this in the documentation, but this seems to happen every 2 hours. Setting "tcp-keep-alive(yes)" does not seem to make it any better. I also discovered that version 1.6.7 has a new option "log_fifo_size" which sounded promising but setting this to a higher value also does not seem to have any influence on this issue. Regards, Peter Daum

Show replies by date

Roberto Nibali

11 May 11 May

4:10 p.m.

...

I am trying to send all important messages from a bunch of other machines to a central syslog-ng server via tcp. I chose tcp partly, because the same log server gets all kinds of less important stuff via udp from other machines, which can easily be distinguished that way, but partially also because I expected tcp to be more reliable. Unfortunately, this does not seem to be the case: When the connection has died for any reason, the client will only discover this when it is trying to send the next message to the server. Only then it starts to wait until "time_reopen" is over and establishes a new connection - the message that originally triggered this and whatever comes in between is lost.

Related if not exactly matching to (IHMO): https://lists.balabit.hu/pipermail/syslog-ng/2005-February/006974.html Only the first message is lost, however.

...

Is there any way to get syslog-ng (v 1.6.5) to check more often whether a tcp connection to a log host still exists and re-establish it otherwise? I did not see any reference to this in the documentation, but this seems to happen every 2 hours.

The problem is rather that the packet is now available anymore.

...

Setting "tcp-keep-alive(yes)" does not seem to make it any better. I also discovered that version 1.6.7 has a new option "log_fifo_size" which sounded promising but setting this to a higher value also does not seem to have any influence on this issue.

Correct. If your problem matches the archive's email, you could start off Bazsi's last reply and find a solution to that ;). Regards, Roberto Nibali, ratz -- ------------------------------------------------------------- addr://Rathausgasse 31, CH-5001 Aarau tel://++41 62 823 9355 http://www.terreactive.com fax://++41 62 823 9356 ------------------------------------------------------------- terreActive AG Wir sichern Ihren Erfolg -------------------------------------------------------------

Peter Daum

8:39 p.m.

Roberto Nibali wrote:

...

Related if not exactly matching to (IHMO):

https://lists.balabit.hu/pipermail/syslog-ng/2005-February/006974.html

Only the first message is lost, however.

Well, yes, it is exactly the same issue and it is indeed only one line that gets lost (which in my case, where typically every host sends about 1 line/hour does not really make a difference). Unfortunately, the previous discussion does not sound very promising. Obviously there is no hope to get this fixed in 1.6.x... How far from being ready for production use is 1.9.x? Maybe I should go back to using udp instead, which is by definition unrealiable, but in this case probably would still yield a higher success rate? Regards, Peter Daum

Dave Johnson

12 May 12 May

12:06 a.m.

Assuming you already tried to find out what was causing the drop on the remote side (firewall/remote server/unknown?), and this can't be tuned, some other random ideas: 1) send udp and tcp to the central server, compare files at end of day (assuming you rotate them every day) 2) run a keepalive message sender: [make sure your sync is (0) for the connection(s)]: a) cronjob every couple minutes to logger a "keepalive" - Filter the message out at the central server. b) have syslog-ng send stats every couple minutes and send it to the central server. On 5/11/05, Peter Daum <gator_ml@yahoo.de> wrote:

...

Roberto Nibali wrote:

...
Related if not exactly matching to (IHMO):

https://lists.balabit.hu/pipermail/syslog-ng/2005-February/006974.html

Only the first message is lost, however.

Well, yes, it is exactly the same issue and it is indeed only one line that gets lost (which in my case, where typically every host sends about 1 line/hour does not really make a difference).

Unfortunately, the previous discussion does not sound very promising. Obviously there is no hope to get this fixed in 1.6.x...

How far from being ready for production use is 1.9.x?

Maybe I should go back to using udp instead, which is by definition unrealiable, but in this case probably would still yield a higher success rate?

Regards, Peter Daum

_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html

Roberto Nibali

7:35 a.m.

...

Well, yes, it is exactly the same issue and it is indeed only one line that gets lost (which in my case, where typically every host sends about 1 line/hour does not really make a difference).

You mean 1 line/hour that is lost, right?

...

Unfortunately, the previous discussion does not sound very promising. Obviously there is no hope to get this fixed in 1.6.x...

There's always hope :). But someone knowledgable with how sockets work in various Unices and a lot of time needs to address this.

...

How far from being ready for production use is 1.9.x?

I couldn't tell, the broad tester base is found to be wanting.

...

Maybe I should go back to using udp instead, which is by definition unrealiable, but in this case probably would still yield a higher success rate?

What is your failure rate exactly? What is your rate of log messages per second? What's the average message size per log packet? Do you have macro expansion configured? How many regexp's are in your config? ... With TCP based syslog'ing you can reliably (at least in my test conducts) send and receive about 15'000 messages per second with an average size of 128 bytes. This is already quite a lot for a production environment. I don't recall the number for UDP but if memory serves me well, it was something around 3000 messages per second. HTH and best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

Peter Daum

9:48 a.m.

Roberto Nibali wrote:

...

...
Well, yes, it is exactly the same issue and it is indeed only one line that gets lost (which in my case, where typically every host sends about 1 line/hour does not really make a difference).

You mean 1 line/hour that is lost, right?

I guess, my description was ambiguous. My problem is _not_ excessive packet loss because syslog-ng couldn't handle the volume but really just the contrary: Per host there is typically maybe than one line/hour and if that line gets lost, this is a significant percentage. I have a "classical" loghost where all kinds of machinery sends their log messages to via udp, That loghost runs syslog-ng and sorts all the messages neatly into different files. I didn't systematically investigate, but I don't have any reason to believe that much gets lost. Because everything works so nicely (I switched to syslog-ng fairly recently and am very thrilled; my thanks to everybody who contributed to it:-), I decided to extend the central logging: There is a bunch of server machines, which in maintain their own local logfiles and in general this is fine. What I am trying to do now, is collect (in addition to the "normal" logging) everything that is important enough to require immediate attention in one location at the loghost. For this, I switched completely to syslog-ng and configured all boxes to forward everything beyond a certain priority via tcp to the loghost. Because I am still fine-tuning the setup (weeding out messages that are sent with a far-to-high priority), I occasionally have to reload the configuration (which also results in all network connections being dropped). This is where I discovered, that if the loghost is restarted for any reason, it takes up to 2 hours for the clients to notice and if they try to send anything during this time it is lost. In my case this is fatal because the hole idea is to normally only watch one log file and rely on everything important showing up there. I guess, for me currently the best option would be to switch to udp instead (maybe on a different port to keep the important stuff separate from printers telling about being out of paper), or get really daring and try 1.9.x ... Regards and Thanks, Peter Daum

Balazs Scheidler

10:36 a.m.

On Thu, 2005-05-12 at 10:48 +0200, Peter Daum wrote:

...

Roberto Nibali wrote:

...
...
Well, yes, it is exactly the same issue and it is indeed only one line that gets lost (which in my case, where typically every host sends about 1 line/hour does not really make a difference).

You mean 1 line/hour that is lost, right?

I guess, my description was ambiguous. My problem is _not_ excessive packet loss because syslog-ng couldn't handle the volume but really just the contrary: Per host there is typically maybe than one line/hour and if that line gets lost, this is a significant percentage.

Please note that a single message is dropped whenever the TCP connection is closed. syslog-ng never closes that by default, only when restarted or when reloaded when keep-alive() is no. (it is no by default for TCP sockets) You can work around this by enabling keep-alive (then HUP is a non-issue) and maybe send periodical keep-alive messages that will trigger reconnections when the central server is indeed restarted.

...

I guess, for me currently the best option would be to switch to udp instead (maybe on a different port to keep the important stuff separate from printers telling about being out of paper), or get really daring and try 1.9.x ...

Although some lab testing would be very welcome, I'd not suggest using it in production environment. By the way, are there anyone on this list using syslog-ng 1.9.x in either production or non-production environments? Feedback was very limited until now. -- Bazsi

7408

Age (days ago)

7409

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

Balazs Scheidler
Dave Johnson
Peter Daum
Roberto Nibali
Roberto Nibali