Hi All, I just 'inherited' a large syslog-ng setup that involves on average 1.5 Terabytes of data generated per day. Logs come from many hundreds of hosts, to one of 10 or more syslog-ng servers (selected via DNS round-robin). From there, it goes straight to an NFS server, without being written to a local disk. There are many destination files on the server, but it is possible that more than one logging server may be attempting to write to the same destination file at the same time. We're seeing corruption where a logline will have another spliced in the middle, parts here, parts there, etc. Not constantly, but enough to give us a headache. Stats on the NFS server show that the cpu can get maxed out, but the netwok pipe is not full. The individual logger servers show that syslog-ng is using about 10% CPU on dual 3gHz P4's I''m thinking that it's the nfs server that cannot cope with multiple sources trying to write to the same file, but would like to throw this out to the masses - do you have any hints as to how I could verify where the source of the problem is? I do realize that this setup is less-than-optimal, and will be taking steps to repair this as time goes on. Alas, I must get corruption down first... then re-architect second. Any hints or suggestions would be greatly appreciated! Thanks Very Much, Erik.
On Mon, 2005-12-05 at 15:07 -0500, Erik Williamson wrote:
but it is possible that more than one logging server may be attempting to write to the same destination file at the same time.
This part is very bad, syslog-ng does not synchronize destination file writes in any way. You have to create independent destination files for each log server writing on the NFS volume, and if you need them in one file, remerge it later. -- Bazsi
On Mon, 05 Dec 2005 15:07:56 EST, Erik Williamson said:
being written to a local disk. There are many destination files on the server, but it is possible that more than one logging server may be attempting to write to the same destination file at the same time.
More than one process writing to a file without benefit of locking is looking for exactly the sort of corruption you're seeing (although for local files, opening with O_APPEND may help, but not for NFS - the Linux 'man 2 open' manpage says: O_APPEND The file is opened in append mode. Before each write(), the file offset is positioned at the end of the file, as if with lseek(). O_APPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.
Beautiful - thanks to both of you for getting back to me! Now to re-architect it all... On 12/6/05, Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:
On Mon, 05 Dec 2005 15:07:56 EST, Erik Williamson said:
being written to a local disk. There are many destination files on the server, but it is possible that more than one logging server may be attempting to write to the same destination file at the same time.
More than one process writing to a file without benefit of locking is looking for exactly the sort of corruption you're seeing (although for local files, opening with O_APPEND may help, but not for NFS - the Linux 'man 2 open' manpage says:
O_APPEND The file is opened in append mode. Before each write(), the file offset is positioned at the end of the file, as if with lseek(). O_APPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
participants (3)
-
Balazs Scheidler
-
Erik Williamson
-
Valdis.Kletnieks@vt.edu