syslog-ng 3.3.7 DNS resolving Problem
Hello there, I've got a little trouble with the DNS resolving of syslog-ng. Last week I patched my syslog installation with the threaded dns bugfix (https://bugzilla.balabit.com/show_bug.cgi?id=212) and it seems like most of my problems are gone but one is still remaining. Many times a day messages are sorted into a folder with the DNS name of my syslog-ng server instead of the real host where the log is coming from. The log line still has the right host in the text and most of the time it is working but I could not find any way to reproduce the problem on demand yet. For debugging I've disabled any logging for the server itself but it still happens. My destinations are configured like this: destination d_syslog { file("/log/syslog/${R_YEAR}/${R_MONTH}/${R_DAY}/$FULLHOST_FROM/$PROGRAM" template(t_plain)); }; And my dns options: use_fqdn(yes); dns_cache(yes); dns_cache_size(16384); dns_cache_expire(300); dns_cache_expire_failed(10); I've tried disabling the syslog-ng cache,installing a local caching bind and after that a nscd but with no success. With 750 servers sending 30k-40k logs per second the dns querys are too expensive and I need the internal syslog-ng caching. With local bind caching the logs per second are dropping down to 2500. Does anybody has an idea to fix this? -- Daniel Neubacher, Network Administrator daniel.neubacher@xing.com<mailto:daniel.neubacher@xing.com> XING AG Gaensemarkt 43, 20354 Hamburg, Germany Tel. +49 40 419131-28, Fax +49 40 419131-11 Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 98807 Exec. Board (Vorstand): Dr. Stefan Groß-Selbeck (Vorsitzender), Dr. Thomas Vollmoeller, Ingo Chu, Dr. Helmut Becker, Jens Pape Chairman of the Supervisory Board (Aufsichtsratsvorsitzender): Dr. Neil Sunderland This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden and may be unlawful.
Daniel Neubacher <daniel.neubacher@xing.com> writes:
Many times a day messages are sorted into a folder with the DNS name of my syslog-ng server instead of the real host where the log is coming from. The log line still has the right host in the text and most of the time it is working but I could not find any way to reproduce the problem on demand yet. For debugging I've disabled any logging for the server itself but it still happens.
This is not the first time I hear about this problem, but so far I have not been able to reproduce it locally :( Is it always the server address that gets used instead of the originating host's name? -- |8]
Yes but the the servers fqdn is used in my case. What I know is that syslog-ng is ignoring the cache while it happens. In the same second where I can find a wrong log the server sorted another line from the same client into the right folder. One of my first guesses where failed dns requests but my caching time of 10 seconds for negative answers don't match the time of the log messages. Guess I will debug some more if there are others which have this problem too. I thought I'm alone with this :) -----Ursprüngliche Nachricht----- Von: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] Im Auftrag von Gergely Nagy Gesendet: Mittwoch, 2. Januar 2013 14:01 An: Syslog-ng users' and developers' mailing list Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem Daniel Neubacher <daniel.neubacher@xing.com> writes:
Many times a day messages are sorted into a folder with the DNS name of my syslog-ng server instead of the real host where the log is coming from. The log line still has the right host in the text and most of the time it is working but I could not find any way to reproduce the problem on demand yet. For debugging I've disabled any logging for the server itself but it still happens.
This is not the first time I hear about this problem, but so far I have not been able to reproduce it locally :( Is it always the server address that gets used instead of the originating host's name? -- |8] ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
To reproduce the problem I tried to generate a massive amount of logs with one client to a server with my live configuration but it didn't work. I guess the problem doesn't lie in the log amount but the hosts. And that's hard to test. After that I did some more live testing. My first test was if this actually happens without dns resolving and it didn't. After that I've disabled threading and it seemed to work. My problem is that I need threading because syslog is now running on 100% :P It was a quick test but after enabling threading again the problem appeared instantly. Now I've disabled it and test it for at least a day. But it seems like threading has one more problem :( -----Ursprüngliche Nachricht----- Von: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] Im Auftrag von Daniel Neubacher Gesendet: Mittwoch, 2. Januar 2013 14:26 An: Syslog-ng users' and developers' mailing list Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem Yes but the the servers fqdn is used in my case. What I know is that syslog-ng is ignoring the cache while it happens. In the same second where I can find a wrong log the server sorted another line from the same client into the right folder. One of my first guesses where failed dns requests but my caching time of 10 seconds for negative answers don't match the time of the log messages. Guess I will debug some more if there are others which have this problem too. I thought I'm alone with this :) -----Ursprüngliche Nachricht----- Von: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] Im Auftrag von Gergely Nagy Gesendet: Mittwoch, 2. Januar 2013 14:01 An: Syslog-ng users' and developers' mailing list Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem Daniel Neubacher <daniel.neubacher@xing.com> writes:
Many times a day messages are sorted into a folder with the DNS name of my syslog-ng server instead of the real host where the log is coming from. The log line still has the right host in the text and most of the time it is working but I could not find any way to reproduce the problem on demand yet. For debugging I've disabled any logging for the server itself but it still happens.
This is not the first time I hear about this problem, but so far I have not been able to reproduce it locally :( Is it always the server address that gets used instead of the originating host's name? -- |8] ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Daniel Neubacher <daniel.neubacher@xing.com> writes:
After that I did some more live testing. My first test was if this actually happens without dns resolving and it didn't. After that I've disabled threading and it seemed to work. My problem is that I need threading because syslog is now running on 100% :P
This narrows it down a little, thanks! -- |8]
I've got no false sorted message since disableling threading. Do you have any idea what I could try else? The syslog service is at 100% all the time and tweaking options like flush_lines and flush_timeout made my server only slower. -----Ursprüngliche Nachricht----- Von: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] Im Auftrag von Gergely Nagy Gesendet: Mittwoch, 2. Januar 2013 15:58 An: Syslog-ng users' and developers' mailing list Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem Daniel Neubacher <daniel.neubacher@xing.com> writes:
After that I did some more live testing. My first test was if this actually happens without dns resolving and it didn't. After that I've disabled threading and it seemed to work. My problem is that I need threading because syslog is now running on 100% :P
This narrows it down a little, thanks! -- |8] ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Daniel Neubacher <daniel.neubacher@xing.com> writes:
I've got no false sorted message since disableling threading. Do you have any idea what I could try else? The syslog service is at 100% all the time and tweaking options like flush_lines and flush_timeout made my server only slower.
I have no further ideas yet, I've been busy with other things in the last couple of days. This issue is the highest on my TODO list now, though. But just by looking at the code, I couldn't find the error, so I'm working on reproducing it locally. -- |8]
I've got none of these errors with syslog-ng 3.4 beta1 - does this make sense? ________________________________________ Von: syslog-ng-bounces@lists.balabit.hu [syslog-ng-bounces@lists.balabit.hu]" im Auftrag von "Gergely Nagy [algernon@balabit.hu] Gesendet: Montag, 7. Januar 2013 17:06 An: Syslog-ng users' and developers' mailing list Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem Daniel Neubacher <daniel.neubacher@xing.com> writes:
I've got no false sorted message since disableling threading. Do you have any idea what I could try else? The syslog service is at 100% all the time and tweaking options like flush_lines and flush_timeout made my server only slower.
I have no further ideas yet, I've been busy with other things in the last couple of days. This issue is the highest on my TODO list now, though. But just by looking at the code, I couldn't find the error, so I'm working on reproducing it locally. -- |8] ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Daniel Neubacher <daniel.neubacher@xing.com> writes:
I've got none of these errors with syslog-ng 3.4 beta1 - does this make sense?
Interesting... all DNS-related code should be the same between 3.4beta1 and the latest 3.3 git master (compared to 3.3.7, both have a fix that uses thread-safe lookups). Compared to 3.3.7, 3.4 beta1 has only one patch that is relevant: commit 11b20b28f7586b2bf10c281328f28d93f39e279c Author: Balazs Scheidler <bazsi@balabit.hu> Date: Fri Dec 14 17:54:39 2012 +0100 resolve_sockaddr: fixed unsafe use of non-reentrant APIs to resolve IP addresses to names As it seems the use of the DNS cache hid the fact that we're not thread safe when resolving IPs to DNS names. This patch attempts to use getnameinfo() API if available that is thread safe and protects all other paths with a mutex. Reported-By: Brian Kroth <bpkroth@gmail.com> Tested-By: Gergely Nagy <algernon@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> https://github.com/balabit/syslog-ng-3.3/commit/11b20b28f7586b2bf10c281328f2... Come to think of it, the lack of this patch might very well be the cause of your issue. Can you check if the latest 3.3 git master works for you? There's an easily buildable tarball available at: http://packages.madhouse-project.org/syslog-ng/3.3/3.3.7/syslog-ng-3.3.7-201... If this does fix the problem, then my apologies, I should've thought of it way sooner. :| -- |8]
I've got no errors yet. Sometimes syslog fooled me and waited to do it for a few days but I hope that's not the case this time. Thanks for your help. -----Ursprüngliche Nachricht----- Von: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] Im Auftrag von Gergely Nagy Gesendet: Dienstag, 8. Januar 2013 09:59 An: Syslog-ng users' and developers' mailing list Betreff: Re: [syslog-ng] syslog-ng 3.3.7 DNS resolving Problem Daniel Neubacher <daniel.neubacher@xing.com> writes:
I've got none of these errors with syslog-ng 3.4 beta1 - does this make sense?
Interesting... all DNS-related code should be the same between 3.4beta1 and the latest 3.3 git master (compared to 3.3.7, both have a fix that uses thread-safe lookups). Compared to 3.3.7, 3.4 beta1 has only one patch that is relevant: commit 11b20b28f7586b2bf10c281328f28d93f39e279c Author: Balazs Scheidler <bazsi@balabit.hu> Date: Fri Dec 14 17:54:39 2012 +0100 resolve_sockaddr: fixed unsafe use of non-reentrant APIs to resolve IP addresses to names As it seems the use of the DNS cache hid the fact that we're not thread safe when resolving IPs to DNS names. This patch attempts to use getnameinfo() API if available that is thread safe and protects all other paths with a mutex. Reported-By: Brian Kroth <bpkroth@gmail.com> Tested-By: Gergely Nagy <algernon@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> https://github.com/balabit/syslog-ng-3.3/commit/11b20b28f7586b2bf10c281328f2... Come to think of it, the lack of this patch might very well be the cause of your issue. Can you check if the latest 3.3 git master works for you? There's an easily buildable tarball available at: http://packages.madhouse-project.org/syslog-ng/3.3/3.3.7/syslog-ng-3.3.7-201... If this does fix the problem, then my apologies, I should've thought of it way sooner. :| -- |8] ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
participants (2)
-
Daniel Neubacher
-
Gergely Nagy