<div dir="ltr"><div class="gmail_default" style="font-size:large">Hi Laci and Scheidler,</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Thanks a lot of discussing it internally and giving me inputs.<br></div><div class="gmail_default" style="font-size:large">I gave further thought for all the inputs provided and I have come up with a patch which I have attached with this email. Appreciate if you folks can give any feedback on the patch. Some points about the fix.</div><div class="gmail_default" style="font-size:large"><ol><li>I didn't have to take either of the thread/signal callback approaches, there was another way give in the man page example. The approach mentioned in the example can easily take advantage of the existing timer infrastructure. This would mean less architectural changes.</li><li>Our application is restricted to Linux hence I didn't have to think about cross platform scenarios.</li><li>After browsing through the code, I felt the need to retain the synchronous way for places where there is not timer mechanism to retry. Which means async approach will need more code changes. Hence I have a if condition checking to see what is the preferred method of resolution the caller expects.</li><li>The ivykis dns resolution might be a much cleaner approach. As you rightly mentioned, we are not sure of the timeline and effort to take in that change.</li><li>A question about dns service that has bothered (I had also evauated using a local dns cache like bind) is what happens when TTL expires. The service would go look up from the dns server on next access right? What if the DNS link goes down during such a look up after TTL expiry. Won't we end up in the same problem that we have now?</li></ol><div>Thanks,<br></div><div>Naveen<br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Jul 24, 2018 at 2:30 AM Scheidler, Balázs <<a href="mailto:balazs.scheidler@oneidentity.com">balazs.scheidler@oneidentity.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>Another issue is what we would do with the incoming message flow until we are waiting for the result of the resolution.<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 24, 2018 at 8:56 AM, Szemere, László <span dir="ltr"><<a href="mailto:laszlo.szemere@balabit.com" target="_blank">laszlo.szemere@balabit.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello Naveen,<div> thank you very much for the investigation and giving a proposal to the issue. I successfully reproduced the blocking behavior with the faulty DNS server in resolv.conf.<br class="m_98519056458189516m_2646111136253732836m_-9180944878290202692gmail-Apple-interchange-newline"> Before answering you, we discussed the topic internally, and I would like to add a few notes to your email. (And maybe start a common discussion about the topic.)</div><div><br></div><div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> 1)</div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> The connection handling in afsocket is running in the main thread (This is where the blocking behavior comes.), and mainly this is the reason why it has many callback functions. Unfortunately it makes the code more complex, and harder to maintain/debug.</div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> You clearly recognized a good point in the code (afsocket_dd_try_connect) to introduce some asynchronous solution for DNS resolving. I think there is no need to handle the "first attempt" any different than the others.</div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"><div style="text-decoration-style:initial;text-decoration-color:initial"> 2) Signal vs thread</div><div style="text-decoration-style:initial;text-decoration-color:initial"> You have absolutely control over this, there are examples in our code for both of them.</div><div style="text-decoration-style:initial;text-decoration-color:initial"> IMHO with signals there might be a chance to conflict with other components. (We recently had an issue with Java, but nothing which can not be sorted out.) This is clearly just a personal opinion.</div><br></div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> note: See examples of using "main_loop_call". If you only use those callbacks to add some tasks to the main loop, than you don't have to deal with parallelism, and it will make the rest of the code independent from the chosen callback method.</div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> 3)</div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> Couple of months ago there were a discussion about adding async DNS features to the alredy used ivykis library: <a href="https://sourceforge.net/p/libivykis/mailman/message/36311243/" target="_blank">https://sourceforge.net/p/libivykis/mailman/message/36311243/</a></div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> Unfortunately there is no action in the topic since that, so I do not recommend for you to wait for the final implementation. You can easily start your own, or contribute to ivykis.</div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> However I think the idea is good: Introduce DNS resolving as an internal module or service. At least one should keep in mind during the refactor of afsocket, to make the DNS "service" interchangeable.<span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span> </span>(If we could gather TTL information beside the resolving, it can be completely independent from the main thread.)</span></div></div><br class="m_98519056458189516m_2646111136253732836m_-9180944878290202692gmail-Apple-interchange-newline"><div> 4)<br></div></div><div><div style="font-size:small;text-decoration-style:initial;text-decoration-color:initial"> The getaddrinfo_a is a GNU extension, so it might not be available on all syslog-ng supported platforms. (There is already a branching in <a href="https://github.com/balabit/syslog-ng/blob/e0ecad3dfafe5f34f7a5d2893b6a518e85ce3753/lib/host-resolve.c#L205" target="_blank">https://github.com/balabit/syslog-ng/blob/e0ecad3dfafe5f34f7a5d2893b6a518e85ce3753/lib/host-resolve.c#L205</a> , so this is just a note, to not forget it.)</div></div><div><br></div><div> <br></div><div>Best regards,</div><div>Laci</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_98519056458189516h5">On Sat, Jul 21, 2018 at 3:39 AM, Naveen Revanna <span dir="ltr"><<a href="mailto:raveenr@gmail.com" target="_blank">raveenr@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="m_98519056458189516h5"><div dir="ltr"><div>Hi Developers,</div><div><br></div><div>When DNS server is unreachable, getaddrinfo() function will block (until it times out after few tens of seconds). If
syslog-ng application is configured with a remote syslog server using its hostname,
afsocket_dd_try_connect() will try to resolve this address in a loop (using a timer). Since getaddrinfo() is blocking, execution of this task will take few seconds thereby delaying the execution of other tasks. Eventually this will reach a state in which there will be lot of backlogs of tasks and syslog() appears to hang, delaying the execution of all shell commands (by upto a minute). <br></div><div><br></div><div>Resolving the remote syslog server's address by having an entry in /etc/host could be a possible workaround (as indicated in a previous thread). However, this is not a desirable solution for our use case.</div><div><br></div><div>Here is my thought on a possible fix that I am thinking to work on for which I am looking for feedback. <br></div><div><ol><li>I am thinking of using getaddrinfo_a() (<a href="http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html" target="_blank">http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html</a>) in the afsocket_dd_try_connect() loop function. We can have this async call only in case it starts looping and not for the first attempt. The first attempt can continue to do the current sync way. <br></li><li>If I were to take the above approach, there are two ways in which we can know the status of address resolution. Which one do you folks think would be better:</li><ol><li>SIGEV_SIGNAL: We can receive a signal when a look-up completes. We can take further action in the handler function.<br></li><li>SIGEV_THREAD: A notification function will be called. This results in creation of a new thread (pthread?). What I am not sure of is the impact of this thread creation on the existing thread infrastructure in syslog-ng through ivykis. <br></li></ol></ol><div><br></div></div><div>Repro:</div><div>It is fairly easy to reproduce this issue</div><div><ol><li>Configure a remote syslog server using it's hostname.<br></li><li>Make the dns unreachable (Edit resolve.conf and put wrong IP(s) for 'nameserver' entries)<br></li><li>$ syslog-ng-ctl reload</li><li>Observe that any command executed on the shell takes unusually long time.</li></ol><div>I can elaborate on any of the items here if something is not clear. Appreciate any pointers.</div><div><br></div><div>Thanks,<br></div><div>Naveen<br></div></div></div>
<br></div></div>______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" rel="noreferrer" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" rel="noreferrer" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" rel="noreferrer" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
<br></blockquote></div><br></div></div>
<br>______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" rel="noreferrer" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" rel="noreferrer" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" rel="noreferrer" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
<br></blockquote></div><br></div>
______________________________________________________________________________<br>
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" rel="noreferrer" target="_blank">https://lists.balabit.hu/mailman/listinfo/syslog-ng</a><br>
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" rel="noreferrer" target="_blank">http://www.balabit.com/support/documentation/?product=syslog-ng</a><br>
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" rel="noreferrer" target="_blank">http://www.balabit.com/wiki/syslog-ng-faq</a><br>
<br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">--Naveen R</div>