[syslog-ng] weird disk usage problems -- syslog-ng/ELSA
r.fulton at auckland.ac.nz
Sat May 10 23:20:23 CEST 2014
It would seem that the issue here is with syslog-ng but it manifested itself in the context of an ELSA instance so I am sending this to both lists, apologies to those who get it twice.
I just discovered that my ELSA machine had run out of disk some 18 hours ago.
/dev/mapper/datavg-d01 985G 943G 0 100% /d01
[rful011 at itslogprd05 ~]$ sudo du -sh /d01/*
But these figures don’t match — there is about 250GB of disk unaccounted for (although I did not pick up this discrepancy at first — instead I went looking in ELSA node.log to try and figure out why logs were not getting deleted). After failing to find any evidence of problems in the logs I then stopped syslog-ng and set about moving the tmp/buffers to another disk. I copied them but when I went to delete the originals they had vanished! I was dazed and confused ;) until I realised that the usage of /d01 had dropped to 77% which allowed cron.pl to process all the waiting buffers and delete them.
It would seem that syslog-ng was sitting on some very large files that had been unlinked from the directory (so du did not see them) stopping syslog-ng would have closed the files allowing the disk to be released. Hmmm…. with syslog-ng it might have been fifos (the ELSA syslog-ng config uses fifos to deliver logs to the ELSA perl scripts) but they should not have had buffers that size.
Has anyone seen anything like this before or have any idea what might trigger this issue?
More information about the syslog-ng