[syslog-ng] weird disk usage problems -- syslog-ng/ELSA

Sun May 11 17:14:30 CEST 2014

On 10.05.2014 23:20, Russell Fulton wrote:
> It would seem that the issue here is with syslog-ng but it manifested itself in the context of an ELSA instance so I am sending this to both lists, apologies to those who get it twice.
>
> I just discovered that my ELSA machine had run out of disk some 18 hours ago.
>
> /dev/mapper/datavg-d01   985G  943G     0 100% /d01
>
> [rful011 at itslogprd05 ~]$ sudo du -sh /d01/*
> 671G	/d01/data
> 16K	/d01/lost+found
> 43G	/d01/mysql
>
> But these figures don’t match — there is about 250GB of disk unaccounted for  (although I did not pick up this discrepancy at first — instead I went looking in ELSA node.log to try and figure out why logs were not getting deleted).  After failing to find any evidence of problems in the logs I then stopped syslog-ng and set about moving the tmp/buffers to another disk.  I copied them but when I went to delete the originals they had vanished! I was dazed and confused ;) until I realised that the usage of /d01 had dropped to 77% which allowed cron.pl to process all the waiting buffers and delete them.
>
> It would seem that syslog-ng was sitting on some very large files that had been unlinked from the directory (so du did not see them) stopping syslog-ng would have closed the files allowing the disk to be released.  Hmmm…. with syslog-ng it might have been fifos (the ELSA syslog-ng config uses fifos to deliver logs to the ELSA perl scripts) but they should not have had buffers that size.
>
> Has anyone seen anything like this before or have any idea what might trigger this issue?

No idea what caused it, but next time it happens use lsof to find out 
which filedescriptors are still open. Something like
  lsof -n | grep deleted
will give you a hint.

HTH.

-- 
Jakub Jankowski|shasta at toxcorp.com|http://toxcorp.com/
GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D