[syslog-ng] Asynchronous address resolution using getaddrinfo_a()

Szemere, László laszlo.szemere at balabit.com
Tue Jul 24 06:56:22 UTC 2018


Hello Naveen,
 thank you very much for the investigation and giving a proposal to the
issue. I successfully reproduced the blocking behavior with the faulty DNS
server in resolv.conf.
 Before answering you, we discussed the topic internally, and I would like
to add a few notes to your email. (And maybe start a common discussion
about the topic.)

 1)
 The connection handling in afsocket is running in the main thread (This is
where the blocking behavior comes.), and mainly this is the reason why it
has many callback functions. Unfortunately it makes the code more complex,
and harder to maintain/debug.
 You clearly recognized a good point in the code (afsocket_dd_try_connect)
to introduce some asynchronous solution for DNS resolving. I think there is
no need to handle the "first attempt" any different than the others.

 2) Signal vs thread
 You have absolutely control over this, there are examples in our code for
both of them.
 IMHO with signals there might be a chance to conflict with other
components. (We recently had an issue with Java, but nothing which can not
be sorted out.) This is clearly just a personal opinion.

 note: See examples of using "main_loop_call". If you only use those
callbacks to add some tasks to the main loop, than you don't have to deal
with parallelism, and it will make the rest of the code independent from
the chosen callback method.

 3)
 Couple of months ago there were a discussion about adding async DNS
features to the alredy used ivykis library: https://sourceforge.
net/p/libivykis/mailman/message/36311243/
 Unfortunately there is no action in the topic since that, so I do not
recommend for you to wait for the final implementation. You can easily
start your own, or contribute to ivykis.
 However I think the idea is good: Introduce DNS resolving as an internal
module or service. At least one should keep in mind during the refactor of
afsocket, to make the DNS "service" interchangeable. (If we could gather
TTL information beside the resolving, it can be completely independent from
the main thread.)

 4)
 The getaddrinfo_a is a GNU extension, so it might not be available on all
syslog-ng supported platforms. (There is already a branching in
https://github.com/balabit/syslog-ng/blob/e0ecad3dfafe5f34f7a5d2893b6a51
8e85ce3753/lib/host-resolve.c#L205 , so this is just a note, to not forget
it.)


Best regards,
Laci


On Sat, Jul 21, 2018 at 3:39 AM, Naveen Revanna <raveenr at gmail.com> wrote:

> Hi Developers,
>
> When DNS server is unreachable, getaddrinfo() function will block (until
> it times out after few tens of seconds). If syslog-ng application is
> configured with a remote syslog server using its hostname,
> afsocket_dd_try_connect() will try to resolve this address in a loop (using
> a timer). Since getaddrinfo() is blocking, execution of this task will take
> few seconds thereby delaying the execution of other tasks. Eventually this
> will reach a state in which there will be lot of backlogs of tasks and
> syslog() appears to hang, delaying the execution of all shell commands (by
> upto a minute).
>
> Resolving the remote syslog server's address by having an entry in
> /etc/host could be a possible workaround (as indicated in a previous
> thread). However, this is not a desirable solution for our use case.
>
> Here is my thought on a possible fix that I am thinking to work on for
> which I am looking for feedback.
>
>    1. I am thinking of using getaddrinfo_a() (
>    http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html
>    <http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html>) in the
>    afsocket_dd_try_connect() loop function. We can have this async call only
>    in case it starts looping and not for the first attempt. The first attempt
>    can continue to do the current sync way.
>    2. If I were to take the above approach, there are two ways in which
>    we can know the status of address resolution. Which one do you folks think
>    would be better:
>       1. SIGEV_SIGNAL: We can receive a signal when a look-up completes.
>       We can take further action in the handler function.
>       2. SIGEV_THREAD: A notification function will be called. This
>       results in creation of a new thread (pthread?). What I am not sure of is
>       the impact of this thread creation on the existing thread infrastructure in
>       syslog-ng through ivykis.
>
>
> Repro:
> It is fairly easy to reproduce this issue
>
>    1. Configure a remote syslog server using it's hostname.
>    2. Make the dns unreachable (Edit resolve.conf and put wrong IP(s) for
>    'nameserver' entries)
>    3. $ syslog-ng-ctl reload
>    4. Observe that any command executed on the shell takes unusually long
>    time.
>
> I can elaborate on any of the items here if something is not clear.
> Appreciate any pointers.
>
> Thanks,
> Naveen
>
> ____________________________________________________________
> __________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=
> syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20180724/60dab051/attachment.html>


More information about the syslog-ng mailing list