Disk buffer getting wiped out
Hi, We have syslog-ng forwarding logs to a remote system, with reliability enabled and disk buffer size set to 50GB in the configuration. However, the actual size of the partition is 30GB. We had overlooked the partition size that resulted in this error. When there is connection loss to remote device and partition gets filled to 100% due to syslog-ng disk buffer. syslog-ng continues to run, but even after we free some space in partition and destination is available, it doesn't push out logs to the destination. We have to restart syslog-ng service. After restarting, we observe that all buffer files are freed up immediately. These files were consuming about 20-30GB. Syslog-ng version using :3.24 Below is disk buffer configuration: mem-buf-size(10000) disk-buf-size(53687091200) reliable(yes) dir(/syslog/buffer) Questions: 1. Does syslog-ng allocate buffers on disk as much as required and in small chunks? Or will it try to allocate all of the specified buffer at one (in this case 50GB)? 2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available? 3. Why did all buffer files get deleted, after a restart Thanks Raghu
Hello Raghu,
1. Does syslog-ng allocate buffers on disk as much as required and in small chunks? Or will it try to allocate all of the specified buffer at one (in this case 50GB)?
The disk space for disk buffers is allocated gradually first, only the amount needed.
2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available?
That might be a bug. Once a destination becomes available (set time-reopen() to a lower value to check more frequently), syslog-ng should send messages out from the disk buffer. Could you reproduce this issue and share the reproduction steps?
3. Why did all buffer files get deleted, after a restart
What do you mean by getting deleted? syslog-ng does not remove disk buffer files completely, but it may rename corrupt buffers, for example. -- László Várady
Hi, On Mon, May 11, 2020 at 12:23:27PM +0000, László Várady (lvarady) wrote:
2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available?
That might be a bug. Once a destination becomes available (set time-reopen() to a lower value to check more frequently), syslog-ng should send messages out from the disk buffer. Could you reproduce this issue and share the reproduction steps?
I remember having a lot of corrupt disk buffers when disk was full. In my case, they caused a segfault on startup, maybe that got "fixed" by a deletion instead?
Correcting subject We did some more troubleshooting on this and we found that all logs in buffer were indeed sent out and that syslog-ng was facing issues in truncating the file. This issue got fixed with restart. However, we are observing that this issue is happening too often. Would anyone help me understand why this could be happening? Thanks Raghu On Mon, May 11, 2020, 19:34 Fabien Wernli <wernli@in2p3.fr> wrote:
Hi,
On Mon, May 11, 2020 at 12:23:27PM +0000, László Várady (lvarady) wrote:
2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available?
That might be a bug. Once a destination becomes available (set time-reopen() to a lower value to check more frequently), syslog-ng should send messages out from the disk buffer. Could you reproduce this issue and share the reproduction steps?
I remember having a lot of corrupt disk buffers when disk was full. In my case, they caused a segfault on startup, maybe that got "fixed" by a deletion instead?
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Hi, Can you describe the issue in detail, please? Could you share the reproduction steps? You say there is an issue with disk-buffer truncating after it's emptied, and that is resolved when syslog-ng restart? How do you check if the queue is empty? Regards, Gabor Raghunath Adhyapak <funduraghu@gmail.com> ezt írta (időpont: 2020. jún. 9., K, 0:08):
Correcting subject
We did some more troubleshooting on this and we found that all logs in buffer were indeed sent out and that syslog-ng was facing issues in truncating the file. This issue got fixed with restart. However, we are observing that this issue is happening too often. Would anyone help me understand why this could be happening?
Thanks Raghu
On Mon, May 11, 2020, 19:34 Fabien Wernli <wernli@in2p3.fr> wrote:
Hi,
On Mon, May 11, 2020 at 12:23:27PM +0000, László Várady (lvarady) wrote:
2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available?
That might be a bug. Once a destination becomes available (set time-reopen() to a lower value to check more frequently), syslog-ng should send messages out from the disk buffer. Could you reproduce this issue and share the reproduction steps?
I remember having a lot of corrupt disk buffers when disk was full. In my case, they caused a segfault on startup, maybe that got "fixed" by a deletion instead?
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Hi, I don't have an exact way to reproduce it. However, this is what we observe: - The buffer file keeps increasing in size. - We keep getting syslog messages and there doesn't seem to be any issue with the service. - When we check the buffer size it grows into gigabytes. - At this point, in one case we saw the buffer file size was around 4GB. We restarted the service and after the restart, boom, the buffer files got cleared immediately (size of buffer file was 4k) - Next time when this happened, we checked the contents of the buffer file and investigated whether the events were received or not. We found that we had received all the events. Therefore we believe that there was a file truncation issue and restart helped to clear it. Let me know if you need any further details and also on how to prevent this from happening Thanks Raghu On Tue, Jun 9, 2020, 12:58 Nagy Gábor <gabor.hl@gmail.com> wrote:
Hi,
Can you describe the issue in detail, please? Could you share the reproduction steps?
You say there is an issue with disk-buffer truncating after it's emptied, and that is resolved when syslog-ng restart? How do you check if the queue is empty?
Regards, Gabor
Raghunath Adhyapak <funduraghu@gmail.com> ezt írta (időpont: 2020. jún. 9., K, 0:08):
Correcting subject
We did some more troubleshooting on this and we found that all logs in buffer were indeed sent out and that syslog-ng was facing issues in truncating the file. This issue got fixed with restart. However, we are observing that this issue is happening too often. Would anyone help me understand why this could be happening?
Thanks Raghu
On Mon, May 11, 2020, 19:34 Fabien Wernli <wernli@in2p3.fr> wrote:
Hi,
On Mon, May 11, 2020 at 12:23:27PM +0000, László Várady (lvarady) wrote:
2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available?
That might be a bug. Once a destination becomes available (set time-reopen() to a lower value to check more frequently), syslog-ng should send messages out from the disk buffer. Could you reproduce this issue and share the reproduction steps?
I remember having a lot of corrupt disk buffers when disk was full. In my case, they caused a segfault on startup, maybe that got "fixed" by a deletion instead?
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Hi, Here is some more detail on the issue: Buffer file on syslog-ng machine root@mymachine:/etc/syslog-ng/disk-buffer# ls -l total 61580512 -rw------- 1 root root 4096 Jun 14 16:57 syslog-ng-00000.rqf -rw------- 1 root root 4096 Jun 8 13:00 syslog-ng-00001.rqf -rw------- 1 root root 63058421272 Jun 14 16:57 syslog-ng-00002.rqf -rw------- 1 root root 4096 Jun 14 16:56 syslog-ng-00003.rqf When I run syslog-ng in debug mode, I see the following lines related to buffer [2020-06-14T16:43:32.085616] Reliable disk-buffer state loaded; filename='/etc/syslog-ng/disk-buffer/syslog-ng-00000.rqf', queue_length='0', size='0' [2020-06-14T16:43:32.085690] Reliable disk-buffer internal state; filename='/etc/syslog-ng/disk-buffer/syslog-ng-00000.rqf', backlog_head='4096', read_head='4096', write_head='4096', backlog_len='0' [2020-06-14T16:43:32.086555] WARNING: window sizing for tcp sources were changed in syslog-ng 3.3, the configuration value was divided by the value of max-connections(). The result was too small, clamping to value of min_iw_size_per_reader. Ensure you have a proper log_fifo_size setting to avoid message loss.; orig_log_iw_size='2', new_log_iw_size='100', min_iw_size_per_reader='100', min_log_fifo_size='50000' [2020-06-14T16:43:32.086677] Accepting connections; addr='AF_INET( 0.0.0.0:514)' [2020-06-14T16:43:32.086802] Reliable disk-buffer state loaded; filename='/etc/syslog-ng/disk-buffer/syslog-ng-00002.rqf', queue_length='14255828', size='20391794080' *[2020-06-14T16:43:32.086871] Reliable disk-buffer internal state; filename='/etc/syslog-ng/disk-buffer/syslog-ng-00002.rqf', backlog_head='42534306360', read_head='42534307736', write_head='62926101816', backlog_len='1'* [2020-06-14T16:43:32.087359] Seeking the journal to the last cursor position; cursor='s=3f7a52af1f26424db365ccc75b2cf79d;i=192f;b=6921d51da51b41d5a26d8674232c77d9;m=75ef33f5cc;t=5a80dfded040d;x=b88b960795be54d2' [2020-06-14T16:43:32.087572] Reliable disk-buffer state loaded; filename='/etc/syslog-ng/disk-buffer/syslog-ng-00003.rqf', queue_length='0', size='0' [2020-06-14T16:43:32.087637] Reliable disk-buffer internal state; filename='/etc/syslog-ng/disk-buffer/syslog-ng-00003.rqf', backlog_head='4096', read_head='4096', write_head='4096', backlog_len='0' I can see that logs are getting sent out to the destination server. However, the buffer is still piling up. I am not sure how to get the buffer cleared. Please advise. Thanks Raghu On Sat, Jun 13, 2020 at 9:09 PM Raghunath Adhyapak <funduraghu@gmail.com> wrote:
Hi, I don't have an exact way to reproduce it. However, this is what we observe: - The buffer file keeps increasing in size. - We keep getting syslog messages and there doesn't seem to be any issue with the service. - When we check the buffer size it grows into gigabytes. - At this point, in one case we saw the buffer file size was around 4GB. We restarted the service and after the restart, boom, the buffer files got cleared immediately (size of buffer file was 4k) - Next time when this happened, we checked the contents of the buffer file and investigated whether the events were received or not. We found that we had received all the events.
Therefore we believe that there was a file truncation issue and restart helped to clear it.
Let me know if you need any further details and also on how to prevent this from happening
Thanks Raghu
On Tue, Jun 9, 2020, 12:58 Nagy Gábor <gabor.hl@gmail.com> wrote:
Hi,
Can you describe the issue in detail, please? Could you share the reproduction steps?
You say there is an issue with disk-buffer truncating after it's emptied, and that is resolved when syslog-ng restart? How do you check if the queue is empty?
Regards, Gabor
Raghunath Adhyapak <funduraghu@gmail.com> ezt írta (időpont: 2020. jún. 9., K, 0:08):
Correcting subject
We did some more troubleshooting on this and we found that all logs in buffer were indeed sent out and that syslog-ng was facing issues in truncating the file. This issue got fixed with restart. However, we are observing that this issue is happening too often. Would anyone help me understand why this could be happening?
Thanks Raghu
On Mon, May 11, 2020, 19:34 Fabien Wernli <wernli@in2p3.fr> wrote:
Hi,
On Mon, May 11, 2020 at 12:23:27PM +0000, László Várady (lvarady) wrote:
2. Why couldn't syslog-ng resume operations after partition was freed up and destination was available?
That might be a bug. Once a destination becomes available (set time-reopen() to a lower value to check more frequently), syslog-ng should send messages out from the disk buffer. Could you reproduce this issue and share the reproduction steps?
I remember having a lot of corrupt disk buffers when disk was full. In my case, they caused a segfault on startup, maybe that got "fixed" by a deletion instead?
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Hi, On Sat, Jun 13, 2020 at 09:09:25PM +0530, Raghunath Adhyapak wrote:
- The buffer file keeps increasing in size.
I'm pretty sure this isn't normal behaviour. If the queue is short (as reported by `syslog-ng-ctl stats`) the file size should be close to zero.
AFAIK, file size truncation only happens when the "read head" of the disk queue file catches up to the "write head". So it is considered to be normal behavior to have a multi-gigabyte disk queue file (if it is still below the maximum allowed size, set in the configuration) that only has a few kilobytes of meaningful (as in actual data that needs to be buffered at the moment) data in it. If the queue file becomes empty (the read head becomes equal to the write head) then (or at the next restart at the latest) the file will be truncated. Best Regards, János -- Janos SZIGETVARI RHCE, License no. 150-053-692 <https://www.redhat.com/rhtapps/verify/?certId=150-053-692> LinkedIn: linkedin.com/in/janosszigetvari __@__˚V˚ Make the switch to open (source) applications, protocols, formats now: - windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice - msn -> jabber protocol (Pidgin, Google Talk) - mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp Fabien Wernli <wernli@in2p3.fr> ezt írta (időpont: 2020. jún. 15., H, 10:06):
Hi,
On Sat, Jun 13, 2020 at 09:09:25PM +0530, Raghunath Adhyapak wrote:
- The buffer file keeps increasing in size.
I'm pretty sure this isn't normal behaviour. If the queue is short (as reported by `syslog-ng-ctl stats`) the file size should be close to zero.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
On Mon, Jun 15, 2020 at 10:21:48AM +0200, SZIGETVÁRI János wrote:
AFAIK, file size truncation only happens when the "read head" of the disk queue file catches up to the "write head".
I guess this is much more likely to happen when using reliable(no), as there is a much smaller volume of messages to be written on disk.
The last message posted by me about growing file size was actually caused by heavy spike on incoming end (4x normal traffic). The output bandwidth was limiting the data being sent out and hence the buffer file was growing. This correctly correlates to the same thing as mentioned and also is reflected in read and write head values. Please excuse me for the false alarm here. However as described in previous emails, the issue is when the flow of logs is normal, we have still observed cases where file size is huge. I will open a separate thread when I observe this again and this time I will run syslog-ng in debug mode and get the read, write, backlog head values for getting proper guidance. Thanks Raghu On Mon, Jun 15, 2020, 14:18 Fabien Wernli <wernli@in2p3.fr> wrote:
On Mon, Jun 15, 2020 at 10:21:48AM +0200, SZIGETVÁRI János wrote:
AFAIK, file size truncation only happens when the "read head" of the disk queue file catches up to the "write head".
I guess this is much more likely to happen when using reliable(no), as there is a much smaller volume of messages to be written on disk.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
participants (5)
-
Fabien Wernli
-
László Várady (lvarady)
-
Nagy Gábor
-
Raghunath Adhyapak
-
SZIGETVÁRI János