Hi, yesterday I experienced a very strange problem with syslog-ng which I'd like to report. Maybe someone here has a clue what might have caused this. I am not able to reproduce that problem (on a different machine and I don't want to try it on the machnie I had experienced it since this fast the main server of my company. Well, what happened? I was playing around with syslog-ng. I had added a program(); destination but for some reason it seemed not to work as i wanted (due to a name resolution problem, there always was the IP instead of the hostname in the logfiles so my script did not parse the input correctly, but that not relevant). In order to see what was going on I added a "destination home { udp(1.2.3.4); };" and a "log (source(net); destination(home); };" At home, from where i was working I had a simple VB program running on my win workstation which was listening on port 514 udp and putting everything it recieved in a log-window. However, no messages appaered. So I decided to write the messages to a log file wich should be more reliable than sending them about a dialup connection, so I modified the log statement to "log (source(net); destination(home); destination(all); };" where all was "destination all { file("/var/log/allmessages"); };". From that moment on (i.e. after the HUP) the whole system went to sleep. Every process trying to use syslog blocked. Within a few seconds I had some hundred pop3d and sendmail tasks running, my own ssh was blocked since I tried to issued a logger command. I was not able to telnet or ssh to this host since both daemon tried to log when I connected. Luckily enough someone else at my company still had an open telnet. I called him and advised him to remove the malicious lines from the config and send syslog-ng a SIGHUP. No Effect. Only a SIGKILL was able to help us out of this strange situation. Within seconds all the daemons went back to work again. I really have no idea what might have caused this and I am not able to give you more details. As I said, I was not able to reproduce this situation on a test server. I am an developer myself and I perfectly know that this description is nearly useless because it is lacking facts, but I am not able to deliever some and I thought that it might be better to state that their might be a problem so that if someone else will report something similar it might help to make it easier to puzzle the whole thing together. Please note that the system is very old (Kernel 2.0.35, still libc) and has an uptime of more than 300 days now without having ECC ram. This system became "to important to be upgraded". It will be replaced in the near future with a new machine so that we do not have to take it down but could switch softly. Maybe I can reproduce that situation then and send you some straces etc. Stefan --- Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.
Hi,
yesterday I experienced a very strange problem with syslog-ng which I'd like to report. Maybe someone here has a clue what might have caused this. I am not able to reproduce that problem (on a different machine and I don't want to try it on the machnie I had experienced it since this fast the main server of my company.
Well, what happened? I was playing around with syslog-ng. I had added a program(); destination but for some reason it seemed not to work as i wanted (due to a name resolution problem, there always was the IP instead of the hostname in the logfiles so my script did not parse the input correctly, but that not relevant). In order to see what was going on I added a "destination home { udp(1.2.3.4); };" and a "log (source(net); destination(home); };" At home, from where i was working I had a simple VB program running on my win workstation which was listening on port 514 udp and putting everything it recieved in a log-window. However, no messages appaered. So I decided to write the messages to a log file wich should be more reliable than sending them about a dialup connection, so I modified the log statement to "log (source(net); destination(home); destination(all); };" where all was "destination all { file("/var/log/allmessages"); };". From that moment on (i.e. after the HUP) the whole system went to sleep. Every process trying to use syslog blocked. Within a few seconds I had some hundred pop3d and sendmail tasks running, my own ssh was blocked since I tried to issued a logger command. I was not able to telnet or ssh to this host since both daemon tried to log when I connected. Luckily enough someone else at my company still had an open telnet. I called him and advised him to remove the malicious lines from the config and send syslog-ng a SIGHUP. No Effect. Only a SIGKILL was able to help us out of this strange situation. Within seconds all the daemons went back to work again.
I really have no idea what might have caused this and I am not able to give you more details. As I said, I was not able to reproduce this situation on a test server. I am an developer myself and I perfectly know that this description is nearly useless because it is lacking facts, but I am not able to deliever some and I thought that it might be better to state that their might be a problem so that if someone else will report something similar it might help to make it easier to puzzle the whole thing together.
Please note that the system is very old (Kernel 2.0.35, still libc) and has an uptime of more than 300 days now without having ECC ram. This system became "to important to be upgraded". It will be replaced in the near future with a new machine so that we do not have to take it down but could switch softly. Maybe I can reproduce that situation then and send you some straces etc.
An strace dump or something could really help here. As it seems syslog-ng blocked on something (a DNS request maybe?), thus couldn't accept connections on /dev/log. Newer libc's allow using unix-dgram /dev/log, try using that, client programs will never block then. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1 url: http://www.balabit.hu/pgpkey.txt
[...]
them about a dialup connection, so I modified the log statement to "log (source(net); destination(home); destination(all); };" where all was "destination all { file("/var/log/allmessages"); };". From that moment on (i.e. after the HUP) the whole system went to sleep. Every process trying to use syslog blocked. Within a few seconds I had some hundred pop3d and sendmail tasks running, my own ssh was blocked since I tried to issued a logger command. I was not able to telnet or ssh to this host since both daemon tried to log when I connected. Luckily enough someone else at my company still had an open telnet. I called him and advised him to remove the malicious lines from the config and send syslog-ng a SIGHUP. No Effect. Only a SIGKILL was able to help us out of this strange situation. Within seconds all the daemons went back to work again.
[...]
An strace dump or something could really help here. As it seems syslog-ng blocked on something (a DNS request maybe?), thus couldn't accept connections on /dev/log.
Ok, yesterday the same thing happend again while one of colleques restarted the nameserver on the same host. This seems to proove your explaination with the block on resolving hostnames on the one hand but brings me I a very nasty situation on the other hand because I cannot igonre that problem any longer. I need name resolution and I need a stable system, of course. So I see three posibility's 1.) As you suggested:
Newer libc's allow using unix-dgram /dev/log, try using that, client programs will never block then.
Only problem: what is a "newer libc"? Do you talk about glibc? 2.) Running two syslog-ng processes, on with name resolution on (receiving all that network-data) and one with name resolution off (reading /dev/log) which should solve my problem, too. 3.) Firewall port syslog at host level and putting all hosts allowed to get through in the hosts file. Will syslog-ng use the hosts file (by using the standart resolver library) or will it bypass it and only do ns lookups? I'd really like to hear your opinion about these possibilities. Of course I'd prefer 1.) since I like things wich work by design an not because of some "dirty tricks". thank you in advance Stefan
An strace dump or something could really help here. As it seems syslog-ng blocked on something (a DNS request maybe?), thus couldn't accept connections on /dev/log.
Ok, yesterday the same thing happend again while one of colleques restarted the nameserver on the same host. This seems to proove your explaination with the block on resolving hostnames on the one hand but brings me I a very nasty situation on the other hand because I cannot igonre that problem any longer. I need name resolution and I need a stable system, of course. So I see three posibility's
1.) As you suggested:
Newer libc's allow using unix-dgram /dev/log, try using that, client programs will never block then.
RedHat patched their libc to send messages via dgram /dev/log. The patch IIRC was transparent, so one could use both unix-dgram and unix-stream as they choose to. Note that if you choose to use unix-dgram, the services will continue to run even if syslog-ng blocks, but logging will be shut down.
Only problem: what is a "newer libc"? Do you talk about glibc?
IIRC the one included in RedHat 6.1 was patched, so 6.2 should be ok. I don't know whether this patch was accepted upstream though.
2.) Running two syslog-ng processes, on with name resolution on (receiving all that network-data) and one with name resolution off (reading /dev/log) which should solve my problem, too.
that should work.
3.) Firewall port syslog at host level and putting all hosts allowed to get through in the hosts file. Will syslog-ng use the hosts file (by using the standart resolver library) or will it bypass it and only do ns lookups?
syslog-ng uses gethostbyaddr(), so a private nsswitch.conf file should be ok.
I'd really like to hear your opinion about these possibilities. Of course I'd prefer 1.) since I like things wich work by design an not because of some "dirty tricks".
I don't like 1), because it may lead to lost messages without notice. I like #2 or #3, but I don't know how to use a private nsswitch.conf file, however I know that this is possible, since sendmail uses one. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1 url: http://www.balabit.hu/pgpkey.txt
1.) As you suggested:
Newer libc's allow using unix-dgram /dev/log, try using that, client programs will never block then.
RedHat patched their libc to send messages via dgram /dev/log. The patch IIRC was transparent, so one could use both unix-dgram and unix-stream as they choose to. Note that if you choose to use unix-dgram, the services will continue to run even if syslog-ng blocks, but logging will be shut down.
Only problem: what is a "newer libc"? Do you talk about glibc?
IIRC the one included in RedHat 6.1 was patched, so 6.2 should be ok. I don't know whether this patch was accepted upstream though.
Ok, then my system is too old.
2.) Running two syslog-ng processes, on with name resolution on (receiving all that network-data) and one with name resolution off (reading /dev/log) which should solve my problem, too.
that should work.
Nice.
3.) Firewall port syslog at host level and putting all hosts allowed to get through in the hosts file. Will syslog-ng use the hosts file (by using the standart resolver library) or will it bypass it and only do ns lookups?
syslog-ng uses gethostbyaddr(), so a private nsswitch.conf file should be ok.
since I have hosts: files dns in my nsswitch.conf it should always use the host file first. The firewall rules (ipchains/ipfw) gurantee that no host not explicetedly named in my /etc/hosts can get a datagram through so syslog-ng will always get a hit from my host file. No need of a private nsswitch.conf.
I'd really like to hear your opinion about these possibilities. Of course I'd prefer 1.) since I like things wich work by design an not because of some "dirty tricks".
I don't like 1), because it may lead to lost messages without notice.
Hm, haven't thought about this aspect. You are right.
I like #2 or #3, but I don't know how to use a private nsswitch.conf file, however I know that this is possible, since sendmail uses one.
I think I'll use both, #2 and #3 in parallel which should give me a maximum of reliablility. This solution should work fine for me because I do not need to mix local and remote entries within one logfile. But what about the following idea: Some sort of "private" hosts file for syslog-ng? Let's say /etc/syslog-ng/syslog-ng.hosts with an "ip\thost" format (even simpler than /etc/hosts), e.g.: 127.0.0.1 localhost 192.168.1.1 host1 192.168.1.2 host2 192.168.1.3 host3 192.168.1.4 host4 It shouldn't be very hard to implement a new option which allows you to use this file (and only this file) as the source of name resolution. If an IP is found, great if not we'll resort to the ip. This makes us independent from any name service problems (not only outages, syslog-ng will use the correct hostname, even if someone spoofes your nameserver) In my eyes, something worth thinking about. Maybe I'll have some free minutes tomorrow giving this a try. Stefan
I like #2 or #3, but I don't know how to use a private nsswitch.conf file, however I know that this is possible, since sendmail uses one.
I think I'll use both, #2 and #3 in parallel which should give me a maximum of reliablility. This solution should work fine for me because I do not need to mix local and remote entries within one logfile.
But what about the following idea:
Some sort of "private" hosts file for syslog-ng? Let's say /etc/syslog-ng/syslog-ng.hosts with an "ip\thost" format (even simpler than /etc/hosts), e.g.:
127.0.0.1 localhost 192.168.1.1 host1 192.168.1.2 host2 192.168.1.3 host3 192.168.1.4 host4
It shouldn't be very hard to implement a new option which allows you to use this file (and only this file) as the source of name resolution. If an IP is found, great if not we'll resort to the ip. This makes us independent from any name service problems (not only outages, syslog-ng will use the correct hostname, even if someone spoofes your nameserver)
In my eyes, something worth thinking about. Maybe I'll have some free minutes tomorrow giving this a try.
I've found an internal glibc 2.1 function which would allow exactly this, but using the system's /etc/hosts file. This would add a dependency on glibc though, and could also mean that a future glibc would become incompatible. This example shows how it looks like: #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <netdb.h> int main() { struct in_addr a; struct hostent *he; inet_aton("193.6.40.1", &a); __nss_configure_lookup("hosts", "files"); he = gethostbyaddr((char *) &a, sizeof(a), AF_INET); if (he) { printf("hostname=%s\n", he->h_name); } else { printf("not found\n"); } } The name of 193.6.40.1 is only found if it's listed in /etc/hosts. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1 url: http://www.balabit.hu/pgpkey.txt
Some sort of "private" hosts file for syslog-ng? Let's say /etc/syslog-ng/syslog-ng.hosts with an "ip\thost" format (even simpler than /etc/hosts), e.g.:
127.0.0.1 localhost 192.168.1.1 host1 192.168.1.2 host2 192.168.1.3 host3 192.168.1.4 host4
It shouldn't be very hard to implement a new option which allows you to use this file (and only this file) as the source of name resolution. If an IP is found, great if not we'll resort to the ip. This makes us independent from any name service problems (not only outages, syslog-ng will use the correct hostname, even if someone spoofes your nameserver)
In my eyes, something worth thinking about. Maybe I'll have some free minutes tomorrow giving this a try.
I've found an internal glibc 2.1 function which would allow exactly this, but using the system's /etc/hosts file. This would add a dependency on glibc though, and could also mean that a future glibc would become incompatible.
Too bad my system is still libc :( I'd really like an universal solution, not depending on some a specifiy version of a specific library.
This example shows how it looks like:
[...]
__nss_configure_lookup("hosts", "files"); [...]
That's really easy. Although this does not solve my problem (as already said: libc) it seems to cry after being turned in an option for syslog-ng: options [...] nss_lookup("files"); }; Meanwhile I'll give an other solution a try bypassing the resolver lib. Stefan
participants (2)
-
Balazs Scheidler
-
Stefan Seufert