Flow-control behaviour and Cached lines timestamping, plus a suspend error in 2.1beta1
Dear Bazsi, I have found two "features" and one bug I would like to discuss a little: 1) We use the syslog-ng 2.1alpha1 as a root system syslog. The problem is, we have had two destinations setup. First destination was the traditional write of system messages to the file like "/var/log/messages" etc. The second destination path was to the TCP stream to the central syslog server. And the problem is, we have used the flow-control flag on the central TCP path. But, when the central TCP syslog-ng becomes unavailable, after all the buffers filled up, the local system syslog-ng stopped processing any system messages. It did not write anything through the local file destinations until the TCP communication was re-established. Is this a correct behaviour? We have found, that to allow smooth processing of local system sources, we can not use flow-control flag on any destination paths it used.. 2) When messages are in syslog-ng buffer, they do not contain any timestamp. We have realized, that the timestamp is added at the time the syslog-ng flushes all the buffers to the disk. This is not very good in case you need to debug the messages ages and all of them have same and non-related timestamp. Is it possible to modify the logic of syslog-ng to timestamp the messages as soon as possible - even before the buffer, at the collection time? We have found it important for log integrity reasons. 3) Directly in the 2.1beta1 there has been introduced some error with the "suspend writing a destination file when an I/O error ..." patch. During our tests we have found it continously thinks the destination is unavailable and writes strange messages into to the internal log: May 29 15:00:02 europa syslog-ng[16798]: I/O error occurred while writing; fd='7', error='Connection timed out (110)' May 29 15:00:02 europa syslog-ng[16798]: Connection broken; time_reopen='60' May 29 15:00:02 europa syslog-ng[16798]: Suspending write operation because of an I/O error; fd='7', time_reopen='60' the lsof -p 16798: COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME syslog-ng 16798 root cwd DIR 253,0 4096 2 / syslog-ng 16798 root rtd DIR 253,0 4096 2 / syslog-ng 16798 root txt REG 253,0 114584 671841 /sbin/syslog-ng syslog-ng 16798 root mem REG 253,0 106397 820372 /lib/ld-2.3.4.so syslog-ng 16798 root mem REG 253,0 1454546 820373 /lib/tls/libc-2.3.4.so syslog-ng 16798 root mem REG 253,0 28504 2919694 /usr/lib/libwrap.so.0.7.6 syslog-ng 16798 root mem REG 253,0 93985 820376 /lib/tls/libpthread-2.3.4.so syslog-ng 16798 root mem REG 253,0 47671 820386 /lib/tls/librt-2.3.4.so syslog-ng 16798 root mem REG 253,0 505200 2930064 /usr/lib/libglib-2.0.so.0.400.7 syslog-ng 16798 root mem REG 253,0 95148 820384 /lib/libnsl-2.3.4.so syslog-ng 16798 root mem REG 253,0 36639 2920956 /usr/local/lib/libevtlog.so.0.0.0 syslog-ng 16798 root 0r CHR 1,3 1619 /dev/null syslog-ng 16798 root 1w CHR 1,3 1619 /dev/null syslog-ng 16798 root 2w CHR 1,3 1619 /dev/null syslog-ng 16798 root 3r REG 0,3 0 4026531850 /proc/kmsg syslog-ng 16798 root 4u unix 0xc1be7ba0 351802 /dev/log syslog-ng 16798 root 5u IPv4 351804 TCP europa.training.hp.com:37205->europa.training.hp.com:5140 (ESTABLISHED) syslog-ng 16798 root 6w REG 253,0 4802 1655155 /var/log/messages syslog-ng 16798 root 8w REG 253,0 525 1655159 /var/log/boot.log This happens during standard system syslog processing on RH ELinux 4 and Solaris 9. You have the syslog-ng confs attached. Could you pleae check it? Best Regards, Pavel
Behal, Pavel wrote:
Dear Bazsi,
I have found two "features" and one bug I would like to discuss a little:
1) We use the syslog-ng 2.1alpha1 as a root system syslog. The problem is, we have had two destinations setup. First destination was the traditional write of system messages to the file like "/var/log/messages" etc. The second destination path was to the TCP stream to the central syslog server. And the problem is, we have used the flow-control flag on the central TCP path. But, when the central TCP syslog-ng becomes unavailable, after all the buffers filled up, the local system syslog-ng stopped processing any system messages. It did not write anything through the local file destinations until the TCP communication was re-established. Is this a correct behaviour? We have found, that to allow smooth processing of local system sources, we can not use flow-control flag on any destination paths it used..
Basically, you can not use flow control for any destination that has a source of /proc/kmsg or /dev/log To "get around" this, we use multiple instances of syslog-ng instance one has; - sources of /proc/kmsg and /dev/log - destination of a pipe (for disk) - destination of a pipe (for network) - very large buffers (2 million records?) instance two has - source of the disk pipe as #1 destination - small buffers ( 2000 ) - destination of files - uses flow control. instance three has - source of the network pip as #1 destination - small buffers (2000) - destination of network - uses flow control This keeps the OS safe in the event that the disk or network can not keep up. This allows us to log when messages are dropped to disk destinations because they actually get dropped to the pipe in instance #1. We can "restart" either the disk or the network instances (#2 #3) without having to drop the connections from applications. We can "restart" either the disk or the network instances without loosing the large buffer of messages stored in instance #1. If instances #2 and #3 used even smaller buffers, then even fewer messages would be lost. A side effect of this is that we can use multiple CPUs to filter, process and log messages even though syslog-ng is single threaded. We typically deploy this structure on our syslog servers, and not on all of our individual servers and do NOT use flow control on the individual servers that send messages to the syslog servers. We have about 15 different pipe destinations for a variety of programs and storage destinations used for real time log analysis, storage and alerting. Evan Rempel.
On Thu, 2008-05-29 at 13:03 +0000, Behal, Pavel wrote:
Dear Bazsi,
I have found two "features" and one bug I would like to discuss a little:
1) We use the syslog-ng 2.1alpha1 as a root system syslog. The problem is, we have had two destinations setup. First destination was the traditional write of system messages to the file like "/var/log/messages" etc. The second destination path was to the TCP stream to the central syslog server. And the problem is, we have used the flow-control flag on the central TCP path. But, when the central TCP syslog-ng becomes unavailable, after all the buffers filled up, the local system syslog-ng stopped processing any system messages. It did not write anything through the local file destinations until the TCP communication was re-established. Is this a correct behaviour? We have found, that to allow smooth processing of local system sources, we can not use flow-control flag on any destination paths it used..
Yes, this is correct behaviour. If syslog-ng continued reading from the local source (to feed the file), it'd have to put the read messages somewhere. Normally the memory based FIFO is used for this purpose, but once it fills up there's no other way, but to suspend reading. The workaround is to increase the buffer size of the destination, and window size of the source to match the network outage you want to plan for. The Premium Edition features a disk buffer, in which case you can use disk space to buffer data towards the TCP destination and until that fills up, the local messages file will correctly be written.
2) When messages are in syslog-ng buffer, they do not contain any timestamp. We have realized, that the timestamp is added at the time the syslog-ng flushes all the buffers to the disk. This is not very good in case you need to debug the messages ages and all of them have same and non-related timestamp.
Hmm. this does not sound true, every message is timestamped as it enters syslog-ng and this timestamp is kept throughout its life in syslog-ng. (with disk buffers it is saved to disk and restored when it is possible to send the message out).
Is it possible to modify the logic of syslog-ng to timestamp the messages as soon as possible - even before the buffer, at the collection time? We have found it important for log integrity reasons.
It works like this.
3) Directly in the 2.1beta1 there has been introduced some error with the "suspend writing a destination file when an I/O error ..." patch. During our tests we have found it continously thinks the destination is unavailable and writes strange messages into to the internal log:
May 29 15:00:02 europa syslog-ng[16798]: I/O error occurred while writing; fd='7', error='Connection timed out (110)' May 29 15:00:02 europa syslog-ng[16798]: Connection broken; time_reopen='60' May 29 15:00:02 europa syslog-ng[16798]: Suspending write operation because of an I/O error; fd='7', time_reopen='60'
Right, I've found this problem in the meanwhile. This code was added to inject artifical errors to check that syslog-ng correctly retries I/O operations. This code was integrated to 2.1beta1. I'm pushing out a git update today, and probably will release 2.1beta2 soon. -- Bazsi
Dear Bazsi, Thank you for the clarificatrion. Regarding the point 2) - sorry, you are right. I have re-tested syslog-ng and it timestamps the events in buffer correctly. I think, my problem was related to the point 1) with enabled flow-control on system sources. I suspect, on Solaris 9, the generated messages got buffered in the OS, and were timestamped just after the reading process was waken-up. Best Regards, Pavel -- Pavel Běhal Hewlett-Packard -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Balazs Scheidler Sent: Thursday, May 29, 2008 7:38 PM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] Flow-control behaviour and Cached lines timestamping, plus a suspend error in 2.1beta1 On Thu, 2008-05-29 at 13:03 +0000, Behal, Pavel wrote:
Dear Bazsi,
I have found two "features" and one bug I would like to discuss a little:
1) We use the syslog-ng 2.1alpha1 as a root system syslog. The problem is, we have had two destinations setup. First destination was the traditional write of system messages to the file like "/var/log/messages" etc. The second destination path was to the TCP stream to the central syslog server. And the problem is, we have used the flow-control flag on the central TCP path. But, when the central TCP syslog-ng becomes unavailable, after all the buffers filled up, the local system syslog-ng stopped processing any system messages. It did not write anything through the local file destinations until the TCP communication was re-established. Is this a correct behaviour? We have found, that to allow smooth processing of local system sources, we can not use flow-control flag on any destination paths it used..
Yes, this is correct behaviour. If syslog-ng continued reading from the local source (to feed the file), it'd have to put the read messages somewhere. Normally the memory based FIFO is used for this purpose, but once it fills up there's no other way, but to suspend reading. The workaround is to increase the buffer size of the destination, and window size of the source to match the network outage you want to plan for. The Premium Edition features a disk buffer, in which case you can use disk space to buffer data towards the TCP destination and until that fills up, the local messages file will correctly be written.
2) When messages are in syslog-ng buffer, they do not contain any timestamp. We have realized, that the timestamp is added at the time the syslog-ng flushes all the buffers to the disk. This is not very good in case you need to debug the messages ages and all of them have same and non-related timestamp.
Hmm. this does not sound true, every message is timestamped as it enters syslog-ng and this timestamp is kept throughout its life in syslog-ng. (with disk buffers it is saved to disk and restored when it is possible to send the message out).
Is it possible to modify the logic of syslog-ng to timestamp the messages as soon as possible - even before the buffer, at the collection time? We have found it important for log integrity reasons.
It works like this.
3) Directly in the 2.1beta1 there has been introduced some error with the "suspend writing a destination file when an I/O error ..." patch. During our tests we have found it continously thinks the destination is unavailable and writes strange messages into to the internal log:
May 29 15:00:02 europa syslog-ng[16798]: I/O error occurred while writing; fd='7', error='Connection timed out (110)' May 29 15:00:02 europa syslog-ng[16798]: Connection broken; time_reopen='60' May 29 15:00:02 europa syslog-ng[16798]: Suspending write operation because of an I/O error; fd='7', time_reopen='60'
Right, I've found this problem in the meanwhile. This code was added to inject artifical errors to check that syslog-ng correctly retries I/O operations. This code was integrated to 2.1beta1. I'm pushing out a git update today, and probably will release 2.1beta2 soon. -- Bazsi ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
participants (3)
-
Balazs Scheidler
-
Behal, Pavel
-
Evan Rempel