Bazsi, you are awesome. Dropped in a new syslog-ng binary with that patch applied and I can no longer repro the lockup! Szalay, I had some straces and ldd's for ya but then I saw Tim Rupp had pasted in better ones in his "pipe causing lockup?" thread, and the patch for that worked. Thanks for the quick response, sorry for my slow one, I had gone away for the weekend.<div>
<br></div><div>Regards,</div><div><br></div><div>-Lance<br><br><div class="gmail_quote">On Mon, Dec 7, 2009 at 4:58 AM, Balazs Scheidler <span dir="ltr"><<a href="mailto:bazsi@balabit.hu">bazsi@balabit.hu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On Thu, 2009-12-03 at 20:05 -0800, Lance Laursen wrote:<br>
</div><div><div></div><div class="h5">> Hello,<br>
><br>
><br>
> I'm having problems with machines eventually hanging on all processes<br>
> that write to /dev/log when using unix-dgram("/dev/log") with<br>
> Syslog-NG 3.0.4. The servers run fine for a while and hum along as<br>
> expected. Unfortunately the success does not last, with various<br>
> programs completely hanging after an undetermined time. Having an<br>
> existing root shell as this happens allows me to kill syslog-ng,<br>
> freeing up all locks.<br>
><br>
><br>
> Repro'ing this is...well, annoying. I have 300+ servers running this<br>
> build of syslog-ng fine, all using unix-stream(). The 4 servers that<br>
> are locking up are the only ones I have running unix-dgram().<br>
> Completely fresh ubuntu 8.04 installs with syslog-ng 3.0.4, identical<br>
> to all other boxes aside from the one syslog-ng option. I've got<br>
> strace output that is hanging after programs try to write to /dev/log<br>
> as well.<br>
><br>
><br>
> I'm currently doing a repro by running "while true ; do logger -p<br>
> <a href="http://local0.info" target="_blank">local0.info</a> ...longest_message_possible... ; sleep 1s ; done" in<br>
> non-exact science and have managed to pile things up after just over<br>
> 120 messages, or two minutes. I can still hop around as root, but all<br>
> programs that try to write to /dev/log pile up. The pile up seems to<br>
> be log-size/throughput based, not time-based after some rudimentary<br>
> tests - though it could be something random that is triggering it<br>
> while my crappy tests are running. My next test plans to have small<br>
> log messages in very rapid succession.<br>
><br>
><br>
> I'm running:<br>
> # uname -a<br>
> Linux tny0032 2.6.24-24-generic #1 SMP Tue Jul 7 19:10:36 UTC 2009<br>
> x86_64 GNU/Linux<br>
> # cat /etc/debian_version<br>
> lenny/sid<br>
> (ubuntu 8.04)<br>
><br>
><br>
> Here's my source definition:<br>
> # all known message sources<br>
> source s_all {<br>
> internal();<br>
> unix-dgram("/dev/log");<br>
> file("/proc/kmsg" program_override("kernel: "));<br>
> };<br>
><br>
><br>
><br>
><br>
> Here's some strace output that locks after trying to write<br>
> to /dev/log:<br>
> # strace su - lance<br>
> ...<br>
> ...<br>
> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0<br>
> socket(PF_FILE, SOCK_DGRAM, 0) = 3<br>
> fcntl(3, F_SETFD, FD_CLOEXEC) = 0<br>
> connect(3, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0<br>
> sendto(3, "<85>Dec 4 02:12:43 su[7086]: pa"..., 148, MSG_NOSIGNAL,<br>
> NULL, 0<br>
><br>
><br>
> # strace logger -p <a href="http://local0.info" target="_blank">local0.info</a> lalala<br>
> produces the same lock-point as above.<br>
><br>
><br>
><br>
><br>
> I thought dgram should be connectionless? I'm not sure how syslog-ng<br>
> could be locking up resources. Has anyone seen this before? I will<br>
> continue looking for a better repro case, if anyone has any ideas<br>
> though shout.<br>
><br>
><br>
> I am using unix-dgram solely because it does not break to a new log<br>
> entry on NewLines. I was encountering a problem where, using<br>
> unix-stream, lighttpd's multi-line log output was getting broken up<br>
> into multiple syslog lines. This would have been fine, except when the<br>
> new line is broken out and made into new log entries, the $hostname<br>
> and $program fields get stripped out, leaving me with just $date $msg.<br>
> This basically negated the ability to filter and relay logs<br>
> effectively. I can elaborate further here if requested, but making<br>
> unix-stream behave the same as unix-dgram with regards to multi-line<br>
> log messages would solve all my problems.<br>
<br>
<br>
</div></div>This should probably be caused by the same problem I answered in the<br>
"pipe causing lockup" thread yesterday.<br>
<br>
This patch should fix it (you can find it in the git repo):<br>
<div class="im"><br>
commit 495bdc3690fe1c01ed95b29f16e97829444973ee<br>
Author: Balazs Scheidler <<a href="mailto:bazsi@balabit.hu">bazsi@balabit.hu</a>><br>
Date: Mon Dec 7 13:36:30 2009 +0100<br>
<br>
The flow-control flag was sometimes enabled even if not requested by the user<br>
<br>
In case a final or fallback flag was enabled on a log statement, it could enable<br>
the flow-control on the same level.<br>
<br>
<br>
<br>
</div><font color="#888888">--<br>
Bazsi<br>
</font><div><div></div><div class="h5"><br>
<br>
______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.campin.net/syslog-ng/faq.html" target="_blank">http://www.campin.net/syslog-ng/faq.html</a><br>
<br>
<br>
______________________________________________________________________<br>
This email has been scanned by the MessageLabs Email Security System.<br>
For more information please visit <a href="http://www.messagelabs.com/email" target="_blank">http://www.messagelabs.com/email</a><br>
______________________________________________________________________<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Lance Laursen<br>Demonware Systems Engineer<br>1-604-689-4594 x3702<br>
</div>