[syslog-ng]DNS caching

Ted_Rule@flextech.co.uk Ted_Rule@flextech.co.uk
Thu, 21 Mar 2002 09:39:12 +0000


A couple of things...

> You also mention "syslog-ng blocks on DNS queries, so enabling DNS may lead
> to a Denial of Service attack." in your documentation. Does this mean that
> syslog messages which are received by the NIC, while syslog-ng performs a
> synchronous DNS lookup, are stored in the kernels receive buffer or are
> dropped?

My belief is that your presumption is indeed correct, and that the kernel will
store
such inbound UDP messages in the local socket buffer ( usually of the order of
10's of kbytes
big ) whilst performing gethostbyaddr() or equivalent.

Reworking any such daemon to be multi-threaded enough to use the re-entrant
version
gethostbyaddr_r() instead,  and thereby allow only the given source socket
whence the original
syslog message arrived to be blocked pending the DNS response, is something I
would imagine
syslog-ng has not yet been encoded to perform.

However, I have come across a even worse problem with syslog ( non-NG syslogd
that is ) and DNS,
namely a terrifying full blown hard deadlock.

This for Linux kernel 2.2.19 + latest RedHat syslogd ( with Unix DGRAM sockets
preferred ) +
local bind 8.2.3 server, whilst acting as remote syslog receiver.

The problem only appeared to arise under extreme conditions, once in a blue
moon.

After months of tortuous attempts at debugging the problem, the issue appears to
be that
syslogd ( as I said this is not syslog-ng but it might yet apply which is why I
raise the issue
here ) deadlocks when it attempts to perform gethostbyaddr() ( and hence
res_query() )
to localhost:53, whilst at exactly the same instant, the DNS server sitting on
localhost:53
perform a syslog operation in respect of something/anything/perhaps completely
unrelated, ( such as logging a Zone Transfer complete ). At this point
syslogd/gethostbyddr() are blocked waiting on the DNS server,
and the DNS server is blocked waiting on syslogd().

Note that this flies in the face of what one might deduce might happen, namely
that
gethostbyaddr() times out waiting for localhost:53 to reply ( because the DNS
server
is temporarily deadlocked waiting for syslogd itself ), and works round the
deadlock
by performing the DNS query against the secondary DNS server in
/etc/resolv.conf.

Once this deadlock has occurred, any given process which potentially uses syslog
deadlocks on the next occasion it attempts to use syslog().

Over a matter of minutes or less, the whole system deadlocks on syslogd, BUT
snmpd ( which very rarely syslogs ) and ping ( which is unlikely to syslog
unless
something in ipchains or similar performs a printk() ) continue to function Ok.

Thus simplistic monitoring of the host indicates nothing untoward via SNMP /
ping
but end-user experience is deadly.

The best solution I have so far is to ensure syslogd never uses a local DNS
server, but instead
always uses a remote DNS server to perform the lookups. Under these conditions,
any such deadlock appears to be recoverable, as gethostbyaddr() never blocks
waiting
on the local DNS server.

Whilst using nscd improves the situation - make it less likely to occur - and
including
a small DNS cache within the syslogd process itself, make it less likely still,
neither
workround completely fixes the potential deadlock. And of course even trying to
debug
for such a deadlock is fraught with difficulties.

I feel the really proper solution where one strongly desires to run a local DNS
server,
as intimated above, is to multi-thread the syslog daemon, use a fully threaded
resolver
library, and only block the socket the message arrived on whilst performing the
gethostbyaddr_() - and I haven't had enough strong coffee all year to face
attempting
that nightmare rewrite!

As to why the deadlock really occurs - i.e. why gethostbyaddr() doesn't simply
timeout,
fall through to backup DNS server, and thereby release the deadlock - I have no
idea.
Maybe it somehow got fixed in Kernel 2.4? Again, testing the theory is tricky.

If anyone has further ideas on releasing the deadlock, I'd be only too happy to
hear
from you.

By the by, I've been looking into migrating to syslog-ng for other reasons
anyway,
mainly due to its more fine-grained filtering structure, and despite some minor
reservations
I'm happily impressed with what I've seen.

Many thanks to all involved in its development so far.



Ted Rule,
Flextech Television.







***************************************************************************************************

This E-mail message, including any attachments, is intended only for the person
or entity to which it is addressed, and may contain confidential information.

If you are not the intended recipient, any review, retransmission, disclosure,
copying, modification or other use of this E-mail message or attachments is
strictly forbidden.

If you have received this E-mail message in error, please contact the author and
delete the message and any attachments from your computer.

You are also advised that the views and opinions expressed in this E-mail
message and any attachments are the author's own, and may not reflect the views
and opinions of FLEXTECH Television Limited.

***************************************************************************************************