Bazsi, you are awesome. Dropped in a new syslog-ng binary with that patch applied and I can no longer repro the lockup! Szalay, I had some straces and ldd's for ya but then I saw Tim Rupp had pasted in better ones in his "pipe causing lockup?" thread, and the patch for that worked. Thanks for the quick response, sorry for my slow one, I had gone away for the weekend. Regards, -Lance On Mon, Dec 7, 2009 at 4:58 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Thu, 2009-12-03 at 20:05 -0800, Lance Laursen wrote:
Hello,
I'm having problems with machines eventually hanging on all processes that write to /dev/log when using unix-dgram("/dev/log") with Syslog-NG 3.0.4. The servers run fine for a while and hum along as expected. Unfortunately the success does not last, with various programs completely hanging after an undetermined time. Having an existing root shell as this happens allows me to kill syslog-ng, freeing up all locks.
Repro'ing this is...well, annoying. I have 300+ servers running this build of syslog-ng fine, all using unix-stream(). The 4 servers that are locking up are the only ones I have running unix-dgram(). Completely fresh ubuntu 8.04 installs with syslog-ng 3.0.4, identical to all other boxes aside from the one syslog-ng option. I've got strace output that is hanging after programs try to write to /dev/log as well.
I'm currently doing a repro by running "while true ; do logger -p local0.info ...longest_message_possible... ; sleep 1s ; done" in non-exact science and have managed to pile things up after just over 120 messages, or two minutes. I can still hop around as root, but all programs that try to write to /dev/log pile up. The pile up seems to be log-size/throughput based, not time-based after some rudimentary tests - though it could be something random that is triggering it while my crappy tests are running. My next test plans to have small log messages in very rapid succession.
I'm running: # uname -a Linux tny0032 2.6.24-24-generic #1 SMP Tue Jul 7 19:10:36 UTC 2009 x86_64 GNU/Linux # cat /etc/debian_version lenny/sid (ubuntu 8.04)
Here's my source definition: # all known message sources source s_all { internal(); unix-dgram("/dev/log"); file("/proc/kmsg" program_override("kernel: ")); };
Here's some strace output that locks after trying to write to /dev/log: # strace su - lance ... ... stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 socket(PF_FILE, SOCK_DGRAM, 0) = 3 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 connect(3, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0 sendto(3, "<85>Dec 4 02:12:43 su[7086]: pa"..., 148, MSG_NOSIGNAL, NULL, 0
# strace logger -p local0.info lalala produces the same lock-point as above.
I thought dgram should be connectionless? I'm not sure how syslog-ng could be locking up resources. Has anyone seen this before? I will continue looking for a better repro case, if anyone has any ideas though shout.
I am using unix-dgram solely because it does not break to a new log entry on NewLines. I was encountering a problem where, using unix-stream, lighttpd's multi-line log output was getting broken up into multiple syslog lines. This would have been fine, except when the new line is broken out and made into new log entries, the $hostname and $program fields get stripped out, leaving me with just $date $msg. This basically negated the ability to filter and relay logs effectively. I can elaborate further here if requested, but making unix-stream behave the same as unix-dgram with regards to multi-line log messages would solve all my problems.
This should probably be caused by the same problem I answered in the "pipe causing lockup" thread yesterday.
This patch should fix it (you can find it in the git repo):
commit 495bdc3690fe1c01ed95b29f16e97829444973ee Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Dec 7 13:36:30 2009 +0100
The flow-control flag was sometimes enabled even if not requested by the user
In case a final or fallback flag was enabled on a log statement, it could enable the flow-control on the same level.
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
-- Lance Laursen Demonware Systems Engineer 1-604-689-4594 x3702