[syslog-ng] syslog-ng 3.2.4 MySQL connection loss Server has gone away

Wed Sep 28 09:06:24 CEST 2011

Hi,

Sorry for the late answer, but better late than never :)

On Sun, 2011-09-04 at 23:15 -0700, Erik Maciejewski wrote:
> Hello,
> 
> I have been researching an issue with syslog-ng 3.2.4 (both the Linux glibc2.3.6 and platform 
> independent compiled from source versions) and MySQL on CentOS 5.6 in which syslog-ng tries to 
> insert a log messages using a dead TCP or unix socket connection to a MySQL database. The issue 
> results in the max attempt to insert a log message and then the message subsequently dropped. 
> This seems to occur regularly in predominately low message volume environments, but has the 
> potential to affect all environments using a MySQL (or other db) destination. The root cause 
> of the issue seems to be directly related to the health of the connection between syslog-ng 
> and MySQL and can be affected by the "wait_timeout" value used by MySQL to kill off inactive 
> connections. I want to provide justification for implementing a fix to syslog-ng as I believe 
> manipulating a, possibly tuned, environment variable for MySQL would seem like the wrong 
> approach to take for remedial action.
> 
> I feel that many MySQL instances are implemented in shared environments and are themselves 
> shared by many applications. New connections to a MySQL instance are generally regarded as 
> low cost and, it would seem, more often than not never used in a persisted fashion when 
> supporting distributed applications. That being said, many times the "wait_timeout" 
> value is set very low so that the MySQL instance can effectively control the number of
> simultaneous connections.
> 
> 
> By taking somewhat of a black-box approach to the issue (I'm just starting to explore 
> syslog-ng), I noticed that there doesn't seem to be any health checking of the TCP or 
> unix socket connection in the SQL destination implementation after the initial 
> connection is made (most of the db actions being abstracted away by libdbi). Even 
> if a syslog-ng database thread is suspended and reactivated due to error, there 
> does not seem to be any attempt to check the health of the database connection.
> 
> I have implemented a fix in my compiled version that will check a database 
> connection's health and attempt to reconnect upon discovering a dead 
> connection. I would like to know the policies/procedures/best practices 
> for submitting such issues and/or fixes for discussion, review, and 
> implementation. If this is a known issue or I am just wrong, all the better! 

> Thank you guys for keeping this project going and I'm looking forward to 
> implementing syslog-ng in a production environment in the very near future!

syslog-ng doesn't explicitly check the aliveness of an SQL connection,
however it does drop and reestablish connection if an error occurs.

If you look at afsql_dd_database_thread() function in afsql.c, you'll
see this code:

      if (!afsql_dd_insert_db(self))
        {
          afsql_dd_disconnect(self);
          afsql_dd_suspend(self);
        }

afsql_dd_insert_db() should return FALSE for any failures, and syslog-ng
basically assumes that if the database side initiates the closure of a
connection, that'll trickle up to syslog-ng as an error return to
dbi_conn_query(), which will in turn cause the SQL destination to
suspend its operations for time_reopen() amount of time and then
reconnect.

What kind of fix did you implement yourself? Can you post the patch?
Thanks.

-- 
Bazsi