[syslog-ng] Lock up problems when using unix-dgram("/dev/log") type on

Lance Laursen lance at demonware.net
Wed Dec 9 01:27:46 CET 2009


Bazsi, you are awesome. Dropped in a new syslog-ng binary with that patch
applied and I can no longer repro the lockup! Szalay, I had some straces and
ldd's for ya but then I saw Tim Rupp had pasted in better ones in his "pipe
causing lockup?" thread, and the patch for that worked. Thanks for the quick
response, sorry for my slow one, I had gone away for the weekend.

Regards,

-Lance

On Mon, Dec 7, 2009 at 4:58 AM, Balazs Scheidler <bazsi at balabit.hu> wrote:

> On Thu, 2009-12-03 at 20:05 -0800, Lance Laursen wrote:
> > Hello,
> >
> >
> > I'm having problems with machines eventually hanging on all processes
> > that write to /dev/log when using unix-dgram("/dev/log") with
> > Syslog-NG 3.0.4. The servers run fine for a while and hum along as
> > expected. Unfortunately the success does not last, with various
> > programs completely hanging after an undetermined time. Having an
> > existing root shell as this happens allows me to kill syslog-ng,
> > freeing up all locks.
> >
> >
> > Repro'ing this is...well, annoying. I have 300+ servers running this
> > build of syslog-ng fine, all using unix-stream(). The 4 servers that
> > are locking up are the only ones I have running unix-dgram().
> > Completely fresh ubuntu 8.04 installs with syslog-ng 3.0.4, identical
> > to all other boxes aside from the one syslog-ng option. I've got
> > strace output that is hanging after programs try to write to /dev/log
> > as well.
> >
> >
> > I'm currently doing a repro by running "while true ; do logger -p
> > local0.info ...longest_message_possible... ; sleep 1s ; done" in
> > non-exact science and have managed to pile things up after just over
> > 120 messages, or two minutes. I can still hop around as root, but all
> > programs that try to write to /dev/log pile up. The pile up seems to
> > be log-size/throughput based, not time-based after some rudimentary
> > tests - though it could be something random that is triggering it
> > while my crappy tests are running. My next test plans to have small
> > log messages in very rapid succession.
> >
> >
> > I'm running:
> > # uname -a
> > Linux tny0032 2.6.24-24-generic #1 SMP Tue Jul 7 19:10:36 UTC 2009
> > x86_64 GNU/Linux
> > # cat /etc/debian_version
> > lenny/sid
> > (ubuntu 8.04)
> >
> >
> > Here's my source definition:
> > # all known message sources
> > source s_all {
> >         internal();
> >         unix-dgram("/dev/log");
> >         file("/proc/kmsg" program_override("kernel: "));
> > };
> >
> >
> >
> >
> > Here's some strace output that locks after trying to write
> > to /dev/log:
> > # strace su - lance
> > ...
> > ...
> > stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
> > socket(PF_FILE, SOCK_DGRAM, 0) = 3
> > fcntl(3, F_SETFD, FD_CLOEXEC) = 0
> > connect(3, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0
> > sendto(3, "<85>Dec 4 02:12:43 su[7086]: pa"..., 148, MSG_NOSIGNAL,
> > NULL, 0
> >
> >
> > # strace logger -p local0.info lalala
> > produces the same lock-point as above.
> >
> >
> >
> >
> > I thought dgram should be connectionless? I'm not sure how syslog-ng
> > could be locking up resources. Has anyone seen this before? I will
> > continue looking for a better repro case, if anyone has any ideas
> > though shout.
> >
> >
> > I am using unix-dgram solely because it does not break to a new log
> > entry on NewLines. I was encountering a problem where, using
> > unix-stream, lighttpd's multi-line log output was getting broken up
> > into multiple syslog lines. This would have been fine, except when the
> > new line is broken out and made into new log entries, the $hostname
> > and $program fields get stripped out, leaving me with just $date $msg.
> > This basically negated the ability to filter and relay logs
> > effectively. I can elaborate further here if requested, but making
> > unix-stream behave the same as unix-dgram with regards to multi-line
> > log messages would solve all my problems.
>
>
> This should probably be caused by the same problem I answered in the
> "pipe causing lockup" thread yesterday.
>
> This patch should fix it (you can find it in the git repo):
>
> commit 495bdc3690fe1c01ed95b29f16e97829444973ee
> Author: Balazs Scheidler <bazsi at balabit.hu>
> Date:   Mon Dec 7 13:36:30 2009 +0100
>
>    The flow-control flag was sometimes enabled even if not requested by the
> user
>
>    In case a final or fallback flag was enabled on a log statement, it
> could enable
>    the flow-control on the same level.
>
>
>
> --
> Bazsi
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Lance Laursen
Demonware Systems Engineer
1-604-689-4594 x3702
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20091208/64228807/attachment.htm 


More information about the syslog-ng mailing list