Re: [syslog-ng] incorrect firewall logs with syslogng-2.0.0 [security error?]

1 Dec 2006

      On Thu, 2006-11-30 at 19:23 +0100, Jacek Kalinski wrote:
...
Dnia 2006-11-30 10:53, Użytkownik Balazs Scheidler napisał:
...
...
After upgradeing from syslogng-1.6.11 to 2.0.0 I've got a strange
messages in /var/log/messages file:
Nov 29 08:14:05 denise WINDOW=16985 RES=0x00 ACK URGP=0
Nov 29 11:10:19 denise PT=34536 WINDOW=16985 RES=0x00 ACK URGP=0
Nov 29 16:42:16 denise W=0 RES=0x00 RST URGP=0
Nov 29 21:37:15 denise 116 ID=34901 DF PROTO=TCP SPT=3584 DPT=80
WINDOW=65535 RES=0x00 ACK FIN URGP=0
It is not necessarily syslog-ng that is at fault here, if the kernel
ring buffer is overflown (because of higher traffic for example), the
kernel might give partial lines while reading /proc/kmsg.
You can increase the kernel ring buffer size by increasing the config
option CONFIG_LOG_BUF_SHIFT
I don't think so.
Previously syslog-ng 1.6.11 was installed for 3 months (exactly since 1
August). Kernel also wasn't changed at least within last month or two.
There wasn't before this kind of problems (even in a lot of higher
network traffic). And I'm sure that there was no "truncated" entries,
because every strange line in system logs is reported to me immediately.
After upgrade (27th November) truncated lines are reported in logs every
3-10 hours. So it is very often.
Everything I can do in this situation is to downgrade to syslog-ng 1.6
(yes I have previous binary in backup) and check if this errors will be
repeated.
PS: I'm not sure it is a syslog-ng error, but circumstances are very,
very strange and connected with upgrade.
Syslog-ng is software, and as such it can certainly have bugs of its
own. I just said that symptoms are very similar to kernel ring buffer
overflows. This can happen if kernel log messages are generated faster
than syslog-ng is able to process them

The possible reasons for syslog-ng not being able to process messages:

1) the rate of kernel messages got higher
2) syslog-ng got slower in your specific scenario
3) there's some kind of external cause that makes syslog-ng to block
(DNS for instance), which increases latency

A possible reason for #3 might be that 2.0.0 lacked DNS cache, which I
have added to my TLA tree.

What could help to track down the issue is to have syslog-ng strace-ed
with timestamps (strace -ttT -s 4096 -p <pid>), wait for a message to be
garbled and analyze the results.

I'd look at:
- identify the garbled message in the strace output (it should appear in
a read() call)
- check the file descriptor of the /proc/kmsg file
- identify the read() call that is before the previously identified read
call and refers to the same file descriptor
- check the time stamps, how much time is spent between the two read()
calls, and what happened between the two
- clearly syslog-ng is working on something instead of reading the
kernel message source.

If you already know where time is being spent (possible causes: DNS
lookups, time_sleep, etc.), you can either work in decreasing these
latencies, or if nothing jumps out:

1) then either syslog-ng 2.0.0 got slower in your specific setup
(although I had reports that syslog-ng 2.0.0 is significantly faster
than 1.6.x) To improve this situation I'd need a profile generated by
oprofile

2) increase the kernel ring buffer size

-- 
Bazsi