Re: [syslog-ng] syslog-ng 3.2.4 MySQL connection loss Server has gone away

28 Sep 2011

      Hi,

Sorry for the late answer, but better late than never :)

On Sun, 2011-09-04 at 23:15 -0700, Erik Maciejewski wrote:
...
Hello,
I have been researching an issue with syslog-ng 3.2.4 (both the Linux glibc2.3.6 and platform 
independent compiled from source versions) and MySQL on CentOS 5.6 in which syslog-ng tries to 
insert a log messages using a dead TCP or unix socket connection to a MySQL database. The issue 
results in the max attempt to insert a log message and then the message subsequently dropped. 
This seems to occur regularly in predominately low message volume environments, but has the 
potential to affect all environments using a MySQL (or other db) destination. The root cause 
of the issue seems to be directly related to the health of the connection between syslog-ng 
and MySQL and can be affected by the "wait_timeout" value used by MySQL to kill off inactive 
connections. I want to provide justification for implementing a fix to syslog-ng as I believe 
manipulating a, possibly tuned, environment variable for MySQL would seem like the wrong 
approach to take for remedial action.
I feel that many MySQL instances are implemented in shared environments and are themselves 
shared by many applications. New connections to a MySQL instance are generally regarded as 
low cost and, it would seem, more often than not never used in a persisted fashion when 
supporting distributed applications. That being said, many times the "wait_timeout" 
value is set very low so that the MySQL instance can effectively control the number of
simultaneous connections.
By taking somewhat of a black-box approach to the issue (I'm just starting to explore 
syslog-ng), I noticed that there doesn't seem to be any health checking of the TCP or 
unix socket connection in the SQL destination implementation after the initial 
connection is made (most of the db actions being abstracted away by libdbi). Even 
if a syslog-ng database thread is suspended and reactivated due to error, there 
does not seem to be any attempt to check the health of the database connection.
I have implemented a fix in my compiled version that will check a database 
connection's health and attempt to reconnect upon discovering a dead 
connection. I would like to know the policies/procedures/best practices 
for submitting such issues and/or fixes for discussion, review, and 
implementation. If this is a known issue or I am just wrong, all the better!
...
Thank you guys for keeping this project going and I'm looking forward to 
implementing syslog-ng in a production environment in the very near future!
syslog-ng doesn't explicitly check the aliveness of an SQL connection,
however it does drop and reestablish connection if an error occurs.

If you look at afsql_dd_database_thread() function in afsql.c, you'll
see this code:

      if (!afsql_dd_insert_db(self))
        {
          afsql_dd_disconnect(self);
          afsql_dd_suspend(self);
        }

afsql_dd_insert_db() should return FALSE for any failures, and syslog-ng
basically assumes that if the database side initiates the closure of a
connection, that'll trickle up to syslog-ng as an error return to
dbi_conn_query(), which will in turn cause the SQL destination to
suspend its operations for time_reopen() amount of time and then
reconnect.

What kind of fix did you implement yourself? Can you post the patch?
Thanks.

-- 
Bazsi

Re: [syslog-ng] syslog-ng 3.2.4 MySQL connection loss Server has gone away

Balazs Scheidler