[syslog-ng]TCP/IP connection - server SIGHUP problem
Roman Ernst
roman_ernst@at.ibm.com
Wed, 3 Oct 2001 09:55:45 +0200
First of all: thanks for your help with my little TCP/IP
connection-drop-determination-problem. Of course the idea using a
monitoring software would solve the problem, but I think many people don't
have / use such a tool...
So... it should be possible to catch such an error by other means...
Somebody wrote that i could figure out via "netstat -an | grep tcp ...."
whether connection is okay or not. Unfortenately when plugging out the
network cable AIX does not recognize this, i.e. with netstat -an the
connection seems to be still available. Only the client tries to close the
connection when he tries to send a log message and can't reach the log
server (after TCP timeout...). So... the client isn't the problem.
But the server never tries to send a package to the client... so he will
never recognize whether the client is available or not.
Maybe it would be useful to introduce some sort of TCP heartbeat (server
sends some packages to the client every xxx minutes) . This way the server
take care of unreachable clients and logs such problems...
If I'm totally wrong with my opinion please let me know...
Okay... another problem:
Sending a SIGHUP to the syslog-ng client results in restarting syslog-ng on
the client, dropping connection to server, reconnecting to the server and
initializing the new configuration (=> that's what the server log
says...)
=> is it really necessary to drop the connection?
Sending a SIGHUP to the syslog-ng server results in restarting the
syslog-ng on the server and initializing the new configuration (that's what
the server log says...)
=> So far so good... now the problem. If the client tries to send
messages to the server the following happens (CLIENT and SERVER LOG)
AF_INET client connected from 10.x.x.1, port xxxxx (SERVER LOG)
io.c: do_write: write () failed (errno 32), Broken pipe (CLIENT +
SERVER LOG)
pkt_buffer::do_flush(): Error flushing data (CLIENT + SERVER LOG)
Connection broken to AF_INET 10.x.x.2, reopening in 10 seconds (CLIENT
+ SERVER LOG)
AND: one log message is missing afterwards...
Okay... that's it. Maybe someone can help me. Thanks in advance!
Best regards
Roman Ernst