[syslog-ng] weird disk usage problems -- syslog-ng/ELSA

Sat May 10 23:20:23 CEST 2014

HI Folks

It would seem that the issue here is with syslog-ng but it manifested itself in the context of an ELSA instance so I am sending this to both lists, apologies to those who get it twice.  

I just discovered that my ELSA machine had run out of disk some 18 hours ago.

/dev/mapper/datavg-d01   985G  943G     0 100% /d01

[rful011 at itslogprd05 ~]$ sudo du -sh /d01/*
671G	/d01/data
16K	/d01/lost+found
43G	/d01/mysql

But these figures don’t match — there is about 250GB of disk unaccounted for  (although I did not pick up this discrepancy at first — instead I went looking in ELSA node.log to try and figure out why logs were not getting deleted).  After failing to find any evidence of problems in the logs I then stopped syslog-ng and set about moving the tmp/buffers to another disk.  I copied them but when I went to delete the originals they had vanished! I was dazed and confused ;) until I realised that the usage of /d01 had dropped to 77% which allowed cron.pl to process all the waiting buffers and delete them.

It would seem that syslog-ng was sitting on some very large files that had been unlinked from the directory (so du did not see them) stopping syslog-ng would have closed the files allowing the disk to be released.  Hmmm…. with syslog-ng it might have been fifos (the ELSA syslog-ng config uses fifos to deliver logs to the ELSA perl scripts) but they should not have had buffers that size.

Has anyone seen anything like this before or have any idea what might trigger this issue?

Russell