lost logs when using syslog-ng on Linux due to /dev/log reopen

newer
Here is a patch to allow template...

older
configure 1.4.15 fails to detect...

Borsenkow Andrej

28 May 2002 28 May '02

10:50 a.m.

I am using syslog-ng on Mandrake (8.2 and post-8.2), glibc-2.2.5. Unfortunately, it tends to lose logs. As far as I can tell it is related to the fact that syslog-ng recreates /dev/log on HUP. It means that every program that has opened syslog connection won't be able to write to /dev/log anymore. The sad thing is, this is _not_ limited to klogd (that has the same problem) but to every other program (I had very bad case of socks server losing some info used for accounting :( It happens to every program (daemon) that does openlog() only once on startup. Which makes drop in replacement for syslog simply impossible :( -andrej

Show replies by date

Balazs Scheidler

28 May 28 May

12:24 p.m.

On Tue, May 28, 2002 at 12:50:04PM +0400, Borsenkow Andrej wrote:

...

I am using syslog-ng on Mandrake (8.2 and post-8.2), glibc-2.2.5.

Unfortunately, it tends to lose logs. As far as I can tell it is related to the fact that syslog-ng recreates /dev/log on HUP. It means that every program that has opened syslog connection won't be able to write to /dev/log anymore.

The sad thing is, this is _not_ limited to klogd (that has the same problem) but to every other program (I had very bad case of socks server losing some info used for accounting :( It happens to every program (daemon) that does openlog() only once on startup.

Which makes drop in replacement for syslog simply impossible :(

Use unix-stream() source instead of unix-dgram(), and don't use klogd. libc correctly uses both /dev/log devices, and programs which use openlog() work. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1

Borsenkow Andrej

3:08 p.m.

...

Use unix-stream() source instead of unix-dgram(), and don't use klogd.

libc correctly uses both /dev/log devices, and programs which use

openlog()

...

work.

thank you. I could reliably reproduce this problem actually so I am not sure if it is really caused by syslog-ng (and not by servers in question). Because I do not want to lose klogd functionality I currently use source sys { unix-stream ("/dev/log"); file ("/var/lib/syslog-ng/klogdpipe" log_prefix ("kernel: ")); internal(); }; where /var/lib/syslog-ng/klogdpipe is FIFO. That was actually a reason for a fifo vs. pipe question :-) regards -andrej

Balazs Scheidler

5:55 p.m.

On Tue, May 28, 2002 at 05:08:03PM +0400, Borsenkow Andrej wrote:

...

...
Use unix-stream() source instead of unix-dgram(), and don't use klogd.

libc correctly uses both /dev/log devices, and programs which use

openlog()

...
work.

thank you. I could reliably reproduce this problem actually so I am not sure if it is really caused by syslog-ng (and not by servers in question).

Because I do not want to lose klogd functionality I currently use

source sys { unix-stream ("/dev/log"); file ("/var/lib/syslog-ng/klogdpipe" log_prefix ("kernel: ")); internal(); };

where /var/lib/syslog-ng/klogdpipe is FIFO. That was actually a reason for a fifo vs. pipe question :-)

you don't really need klogd anymore. ksymoops is a lot better than anything klogd ever performed. so let the oops go to your syslog and analyize it with ksymoops, instead of relying on klogd on this functionality. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1

Borsenkow Andrej

30 May 30 May

7:26 a.m.

...

so let the oops go to your syslog and analyize it with ksymoops,

instead of

...

relying on klogd on this functionality.

This works only when /proc is mounted. klogd is smart enough to get kernel logs both with and without /proc. While it is a solution for a single system you completely control it is unacceptable for a general purpose RPM (that is my concern). I guess I'll just patch grammar file :-) -andrej

Peter J. Holzer

31 May 31 May

10:14 p.m.

On 2002-05-28 12:50:04 +0400, Borsenkow Andrej wrote:

...

I am using syslog-ng on Mandrake (8.2 and post-8.2), glibc-2.2.5.

Unfortunately, it tends to lose logs. As far as I can tell it is related to the fact that syslog-ng recreates /dev/log on HUP. It means that every program that has opened syslog connection won't be able to write to /dev/log anymore.

Not quite. I had the same problem with syslog-ng 1.4.something (I think I recompiled a Mandrake source rpm on Redhat, but I'm not 100% sure) and 1.5.13. When I investigated the problem I found two bugs: One in syslog-ng and one in glibc 2.1: The bug in syslog-ng was related to recreating /dev/log on HUP. It was fixed in 1.5.15, I think (mentioned in the changelog), so I didn't bother to report it. The bug in libc is more subtle: Apparently[1] the syslog library function works like this: check if there is an open file descriptor to syslogd. If there is none, try to connect in stream and dgram mode (one of them will usually work). Write message to the socket. If this fails, close socket. So, if syslog-ng (or syslogd, for that matter[2]) recreates the socket, the next write will fail, syslog will silently drop the message, but at the next syslog call it will reopen the socket and continue to log happily. Doesn't matter much if you lose one message every 24 hours (or how often you switch logs), don't you think? Unless you have a forking server like sendmail, which calls openlog in the parent process. Then the socket will always be open but invalid after the fork, so you will lose the first syslog message from each connection - in the case of sendmail, you will generally only see "to=" messages, but no "from=" messages. hp [1] I didn't have the source and was only looking at strace-Output, so I may be wrong. Because of this, and because I only tested it with glibc 2.1 and wanted to look at 2.2, I haven't reported this to the glibc maintainer yet. If anyone wants to, feel free. [2] Syslogd doesn't recreate the socket on SIGHUP. But you get the same behaviour if you kill and restart syslogd. -- _ | Peter J. Holzer | > Benötigt man für Linux einen Virenscanner? |_|_) | Schriftführer LUGA | Nein. | | | hjp@luga.at | Linux bootet und läuft auch ohne Virenscanner __/ | http://www.luga.at/ | einwandfrei. -- Heimo Schoen in al

Borsenkow Andrej

1 Jun 1 Jun

8:40 p.m.

New subject: lost logs when using syslog-ng on Linux due to /dev/log reopen

В Сбт, 01.06.2002, в 00:14, Peter J. Holzer написал:

...

On 2002-05-28 12:50:04 +0400, Borsenkow Andrej wrote:

...
I am using syslog-ng on Mandrake (8.2 and post-8.2), glibc-2.2.5.

Unfortunately, it tends to lose logs. As far as I can tell it is related to the fact that syslog-ng recreates /dev/log on HUP. It means that every program that has opened syslog connection won't be able to write to /dev/log anymore.

Not quite. I had the same problem with syslog-ng 1.4.something (I think I recompiled a Mandrake source rpm on Redhat, but I'm not 100% sure) and 1.5.13. When I investigated the problem I found two bugs: One in syslog-ng and one in glibc 2.1:

The bug in syslog-ng was related to recreating /dev/log on HUP. It was fixed in 1.5.15, I think (mentioned in the changelog), so I didn't bother to report it.

I am not as sure: {pts/1}% rpm -q syslog-ng syslog-ng-1.5.18-0.3mdk {pts/1}% LC_TIME=C ll /dev/log srw-rw-rw- 1 root root 0 Jun 1 22:32 /dev/log= {pts/1}% sudo kill -1 1301 {pts/1}% LC_TIME=C ll /dev/log srw-rw-rw- 1 root root 0 Jun 1 22:33 /dev/log= actually it quite happily does recreate /dev/log every time.

...

The bug in libc is more subtle:

Apparently[1] the syslog library function works like this:

check if there is an open file descriptor to syslogd. If there is none, try to connect in stream and dgram mode (one of them will usually work).

Write message to the socket.

If this fails, close socket.

So, if syslog-ng (or syslogd, for that matter[2]) recreates the socket, the next write will fail, syslog will silently drop the message, but at the next syslog call it will reopen the socket and continue to log happily. Doesn't matter much if you lose one message every 24 hours (or how often you switch logs), don't you think?

You are not serious, are you? So you mean that it does not matter if you lose - early warning about possibly intrusion attempt - or - - hardware problem report (like ECC warning) - or - - amount of download used for accounting

...

Unless you have a forking server like sendmail, which calls openlog in the parent process. Then the socket will always be open but invalid after the fork, so you will lose the first syslog message from each connection - in the case of sendmail, you will generally only see "to=" messages, but no "from=" messages.

hp

[1] I didn't have the source and was only looking at strace-Output, so I may be wrong. Because of this, and because I only tested it with glibc 2.1 and wanted to look at 2.2, I haven't reported this to the glibc maintainer yet. If anyone wants to, feel free.

[2] Syslogd doesn't recreate the socket on SIGHUP. But you get the same behaviour if you kill and restart syslogd.

Yes I know. Actually you just gave perfect explanation why syslog is not suitable for mission critical application :( -andrej

Peter J. Holzer

10:42 p.m.

On 2002-06-01 22:40:19 +0400, Borsenkow Andrej wrote:

...

? ???, 01.06.2002, ? 00:14, Peter J. Holzer ???????:

...
On 2002-05-28 12:50:04 +0400, Borsenkow Andrej wrote:

...
I am using syslog-ng on Mandrake (8.2 and post-8.2), glibc-2.2.5.

Unfortunately, it tends to lose logs. As far as I can tell it is related to the fact that syslog-ng recreates /dev/log on HUP. It means that every program that has opened syslog connection won't be able to write to /dev/log anymore.

Not quite. I had the same problem with syslog-ng 1.4.something (I think I recompiled a Mandrake source rpm on Redhat, but I'm not 100% sure) and 1.5.13. When I investigated the problem I found two bugs: One in syslog-ng and one in glibc 2.1:

The bug in syslog-ng was related to recreating /dev/log on HUP. It was fixed in 1.5.15, I think (mentioned in the changelog), so I didn't bother to report it.

I am not as sure:

{pts/1}% rpm -q syslog-ng syslog-ng-1.5.18-0.3mdk {pts/1}% LC_TIME=C ll /dev/log srw-rw-rw- 1 root root 0 Jun 1 22:32 /dev/log= {pts/1}% sudo kill -1 1301 {pts/1}% LC_TIME=C ll /dev/log srw-rw-rw- 1 root root 0 Jun 1 22:33 /dev/log=

actually it quite happily does recreate /dev/log every time.

Yes, that's ok. Existing connections shouldn't be affected by that. The problem was (IIRC) that existing connections were thrown away even though keep-alive(yes) was set. Maybe Bazsi can provide the details, I gave up trying to understand libol at 4:00 am and the next day I saw that 1.5.15 was out and it seemed to fix the problem. Before that I already had found a workaround, which I kept using, so I wouldn't encounter the problem anymore, even if it still existed (I had tested it, though and couldn't reproduce it anymore): Don't use logrotate to switch logs - let syslog-ng write to a new file every day by using a file directive like this: file("/var/log/$HOST/$YEAR/$MONTH/$DAY/$FACILITY.$PRIORITY" And then gzip or remove the files after n days.

...

...
Apparently[1] the syslog library function works like this: [... why the syslog function in glibc reproducably loses messages ...] Doesn't matter much if you lose one message every 24 hours (or how often you switch logs), don't you think?

You are not serious, are you?

No I'm not. But the guy who wrote the syslog function was obviously thinking along those lines. Given that syslog has traditionally been implemented over an unreliable transport protocol (UDP) the assumption that nobody would rely on getting all messages probably wasn't that unplausible.

...

So you mean that it does not matter if you lose

- early warning about possibly intrusion attempt - or - - hardware problem report (like ECC warning) - or - - amount of download used for accounting

If you cannot afford to lose messages ocassionally, don't use syslog. It was never meant for that, as can be seen from the choice of UDP for transport and the missing return values from openlog(3) and syslog(3). Some implementations at least use a transport protocol which doesn't lose packets, but if the server isn't running for some reason, the client still has no chance of noticing that.

...

...
[2] Syslogd doesn't recreate the socket on SIGHUP. But you get the same behaviour if you kill and restart syslogd.

Yes I know. Actually you just gave perfect explanation why syslog is not suitable for mission critical application :(

Balazs Scheidler

2 Jun 2 Jun

12:18 p.m.

On Sat, Jun 01, 2002 at 10:42:40PM +0200, Peter J. Holzer wrote:

...

On 2002-06-01 22:40:19 +0400, Borsenkow Andrej wrote:

...
? ???, 01.06.2002, ? 00:14, Peter J. Holzer ???????:

...
On 2002-05-28 12:50:04 +0400, Borsenkow Andrej wrote:

...
I am using syslog-ng on Mandrake (8.2 and post-8.2), glibc-2.2.5.

Unfortunately, it tends to lose logs. As far as I can tell it is related to the fact that syslog-ng recreates /dev/log on HUP. It means that every program that has opened syslog connection won't be able to write to /dev/log anymore.

Not quite. I had the same problem with syslog-ng 1.4.something (I think I recompiled a Mandrake source rpm on Redhat, but I'm not 100% sure) and 1.5.13. When I investigated the problem I found two bugs: One in syslog-ng and one in glibc 2.1:

The bug in syslog-ng was related to recreating /dev/log on HUP. It was fixed in 1.5.15, I think (mentioned in the changelog), so I didn't bother to report it.

I am not as sure:

{pts/1}% rpm -q syslog-ng syslog-ng-1.5.18-0.3mdk {pts/1}% LC_TIME=C ll /dev/log srw-rw-rw- 1 root root 0 Jun 1 22:32 /dev/log= {pts/1}% sudo kill -1 1301 {pts/1}% LC_TIME=C ll /dev/log srw-rw-rw- 1 root root 0 Jun 1 22:33 /dev/log=

actually it quite happily does recreate /dev/log every time.

Yes, that's ok. Existing connections shouldn't be affected by that. The problem was (IIRC) that existing connections were thrown away even though keep-alive(yes) was set. Maybe Bazsi can provide the details, I gave up trying to understand libol at 4:00 am and the next day I saw that 1.5.15 was out and it seemed to fix the problem.

It was the keep-alive(yes) state when syslog-ng was losing messages even if should not have. /dev/log is recreated every time, but given it is a SOCK_STREAM socket, the sender program can notice the broken pipe, and those will work. If it is SOCK_DGRAM (to which several distributions switched to) the recreation of /dev/log is bad.

...

...
...
Apparently[1] the syslog library function works like this: [... why the syslog function in glibc reproducably loses messages ...] Doesn't matter much if you lose one message every 24 hours (or how often you switch logs), don't you think?

You are not serious, are you?

No I'm not. But the guy who wrote the syslog function was obviously thinking along those lines. Given that syslog has traditionally been implemented over an unreliable transport protocol (UDP) the assumption that nobody would rely on getting all messages probably wasn't that unplausible.

I'm going to address this issue by adding several features to syslog-ng 2: * flow controlled message paths * disk buffering of messages when a destination is absent * maybe a preloadable shared library, which changes the syslog () implementation in libc -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1

Borsenkow Andrej

2:32 p.m.

New subject: lost logs when using syslog-ng on Linux due to /dev/log reopen

В Вск, 02.06.2002, в 14:18, Balazs Scheidler написал:

...

I'm going to address this issue by adding several features to syslog-ng 2:

* flow controlled message paths * disk buffering of messages when a destination is absent * maybe a preloadable shared library, which changes the syslog () implementation in libc

The last is what I was about to suggest. Thank you. The second one looks really tricky to me (in client that is); may be creating a hidden file (creat and then unlink) is a way to go. anyway, thank you for a good program -andrej

Borsenkow Andrej

2:36 p.m.

New subject: lost logs when using syslog-ng on Linux due to /dev/log reopen

В Вск, 02.06.2002, в 14:18, Balazs Scheidler написал:

...

/dev/log is recreated every time, but given it is a SOCK_STREAM socket, the sender program can notice the broken pipe, and those will work.

If it is SOCK_DGRAM (to which several distributions switched to) the recreation of /dev/log is bad.

Yes, Mandrake 8.2 and above that I have defaults to DGRAM. It is probably glibc issue finally; it is using glibc-2.2.[45]. Is there a pressing need to recreate /dev/log on SIGHUP? -andrej

Ed Ravin

3 Jun 3 Jun

1:26 a.m.

New subject: libc lost logs bug

Peter J. Holzer writes:

...

The bug in libc is more subtle: Apparently[1] the syslog library function works like this:

check if there is an open file descriptor to syslogd. If there is none, try to connect in stream and dgram mode (one of them will usually work).

Write message to the socket.

If this fails, close socket.

So, if syslog-ng (or syslogd, for that matter[2]) recreates the socket, the next write will fail, syslog will silently drop the message, but at the next syslog call it will reopen the socket and continue to log happily.

There is a similiar, but more insidious bug in all versions of NetBSD up to and including 1.5.2: if a program has an open (but invalid) file descriptor for syslog, all subsequent log messages are lost. NetBSD 1.5.3 and 1.6 will properly re-open the socket if the write fails (and not lose the message if the open succeeds). While researching this problem I noticed that some programs, like sendmail, open the socket once and reuse it, while other programs, like sshd, seem to close and re-open the syslog socket. The result was that I was losing sendmail messages from my logs, but not sshd messages. So with this bug, if I restarted syslogd (or syslog-ng), I would lose the logs of sendmail and any other daemons that kept their syslog sockets open.

8528

Age (days ago)

8533

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Balazs Scheidler
Borsenkow Andrej
Ed Ravin
Peter J. Holzer