On FreeBSd: syslog-ng: I/O error occurred while writing; fd='xx', error='No buffer space available (yy)'
Hi List I've been seeing the following in the messages log of my freebsd syslog server for quite some time now: --- Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='12', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='13', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' --- These happen at a frequency of about 7 per minute on average. See attached trend graphs for an idea of the volume of traffic we're doing, as well as the memory and cpu utilisation trends on this server during this period. As can be seen from the graphs, load does not seem to be the issue. Occasionally during the week, the system freezes and requires a reboot, I think it's related to the above message, though I'm not sure. My question is: What does this error mean, and how can I resolve it? I have tried to frame this as an operating system kernel resource issue, and experimented with increasing the freebsd kernel sysctls for UDP performance: --- [root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.nmbclusters=102400 kern.ipc.nmbclusters: 25600 -> 102400 [root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.maxsockbuf=201326592 kern.ipc.maxsockbuf: 100663296 -> 201326592 [root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl net.inet.udp.recvspace=33554432 net.inet.udp.recvspace: 16777216 -> 33554432 --- This has reduced the frequency of the errors a little, but in general the problem still remains. Syslog version: -- [root@syslog2]# syslog-ng -V syslog-ng 2.0.10 -- FreeBSD version: -- FreeBSD 7.2-STABLE #0 -- Any help would be much appreciated! Traiano
Anybody ? On 20/03/2012 12:53, "Traiano Welcome" <Traiano.Welcome@mtnbusiness.co.za> wrote:
Hi List
I've been seeing the following in the messages log of my freebsd syslog server for quite some time now:
--- Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='12', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='13', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' ---
These happen at a frequency of about 7 per minute on average. See attached trend graphs for an idea of the volume of traffic we're doing, as well as the memory and cpu utilisation trends on this server during this period. As can be seen from the graphs, load does not seem to be the issue. Occasionally during the week, the system freezes and requires a reboot, I think it's related to the above message, though I'm not sure.
My question is: What does this error mean, and how can I resolve it?
I have tried to frame this as an operating system kernel resource issue, and experimented with increasing the freebsd kernel sysctls for UDP performance:
--- [root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.nmbclusters=102400 kern.ipc.nmbclusters: 25600 -> 102400
[root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.maxsockbuf=201326592 kern.ipc.maxsockbuf: 100663296 -> 201326592
[root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl net.inet.udp.recvspace=33554432 net.inet.udp.recvspace: 16777216 -> 33554432 ---
This has reduced the frequency of the errors a little, but in general the problem still remains.
Syslog version:
-- [root@syslog2]# syslog-ng -V syslog-ng 2.0.10
--
FreeBSD version:
-- FreeBSD 7.2-STABLE #0 --
Any help would be much appreciated! Traiano
__________________________________________________________________________ ____ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
On Thu, 2012-03-22 at 07:49 +0000, Traiano Welcome wrote:
Anybody ?
On 20/03/2012 12:53, "Traiano Welcome" <Traiano.Welcome@mtnbusiness.co.za> wrote:
Hi List
I've been seeing the following in the messages log of my freebsd syslog server for quite some time now:
--- Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='12', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='13', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' ---
These happen at a frequency of about 7 per minute on average. See attached trend graphs for an idea of the volume of traffic we're doing, as well as the memory and cpu utilisation trends on this server during this period. As can be seen from the graphs, load does not seem to be the issue. Occasionally during the week, the system freezes and requires a reboot, I think it's related to the above message, though I'm not sure.
My question is: What does this error mean, and how can I resolve it?
I have tried to frame this as an operating system kernel resource issue, and experimented with increasing the freebsd kernel sysctls for UDP performance:
--- [root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.nmbclusters=102400 kern.ipc.nmbclusters: 25600 -> 102400
[root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.maxsockbuf=201326592 kern.ipc.maxsockbuf: 100663296 -> 201326592
[root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl net.inet.udp.recvspace=33554432 net.inet.udp.recvspace: 16777216 -> 33554432 ---
This has reduced the frequency of the errors a little, but in general the problem still remains.
Syslog version:
-- [root@syslog2]# syslog-ng -V syslog-ng 2.0.10
I think I know the issue. poll() indicates that it is possible to write messages to the udp destination, and then send() fails with ENOBUFS. Last time I've checked the only possible way to handle ENOBUFS in this case is either to ignore it (after all udp is lossy), or retry sending after sleeping some. A similar fix has gone into the "loggen" tool, which generates loads of messages and triggered ENOBUFS similarly. In loggen, a sleep was added, in syslog-ng, that's less of an option as that might block other things from being done. The easiest solution is to return EAGAIN in this case, which causes syslog-ng to retry the send() operation in the next poll iteration, which causes a busy loop until more buffers become available. The patch below in 3.3 makes syslog-ng ignore ENOBUFS. I don't maintain 2.0 anymore, but hey, it was trivial, so here you go: diff --git a/src/fdwrite.c b/src/fdwrite.c index 3957b7d..f76bea7 100644 --- a/src/fdwrite.c +++ b/src/fdwrite.c @@ -52,6 +52,9 @@ fd_do_write(FDWrite *self, const void *buf, size_t buflen) fsync(self->fd); } while (rc == -1 && errno == EINTR); + + if (rc < 0 && errno == ENOBUFS) + return buflen; return rc; } And the commit in 3.3: commit 4d6656bbbc19877e7dc2175a7fb3b9aeeaced270 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sat Mar 31 21:14:28 2012 +0200 logtransport: handle ENOBUFS on output FreeBSD returns ENOBUFS on send() failure instead of indicating this conditions via poll(). The return of ENOBUFS actually is a send error and is calulated in IP statistics, so the best is to handle it as a success. The only alternative would be to return EAGAIN, which could cause syslog-ng to spin as long as buffer space is unavailable. Since I'm not sure how much time that would take and I think spinning the CPU is not a good idea in general, I just drop the packet in this case. UDP is lossy anyway Reported-By: Traiano Welcome <Traiano.Welcome@mtnbusiness.co.za> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> -- Bazsi
Hi Balasz On 31/03/2012 21:16, "Balazs Scheidler" <bazsi@balabit.hu> wrote:
On Thu, 2012-03-22 at 07:49 +0000, Traiano Welcome wrote:
Anybody ?
On 20/03/2012 12:53, "Traiano Welcome" <Traiano.Welcome@mtnbusiness.co.za> wrote:
Hi List
I've been seeing the following in the messages log of my freebsd syslog server for quite some time now:
--- Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='12', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='13', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' ---
These happen at a frequency of about 7 per minute on average. See attached trend graphs for an idea of the volume of traffic we're doing, as well as the memory and cpu utilisation trends on this server during this period. As can be seen from the graphs, load does not seem to be the issue. Occasionally during the week, the system freezes and requires a reboot, I think it's related to the above message, though I'm not sure.
My question is: What does this error mean, and how can I resolve it?
I have tried to frame this as an operating system kernel resource issue, and experimented with increasing the freebsd kernel sysctls for UDP performance:
--- [root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.nmbclusters=102400 kern.ipc.nmbclusters: 25600 -> 102400
[root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl kern.ipc.maxsockbuf=201326592 kern.ipc.maxsockbuf: 100663296 -> 201326592
[root@syslog2 <mailto:root@syslog2.ops.mtnbusiness.co.za> /var/log]# sysctl net.inet.udp.recvspace=33554432 net.inet.udp.recvspace: 16777216 -> 33554432 ---
This has reduced the frequency of the errors a little, but in general the problem still remains.
Syslog version:
-- [root@syslog2]# syslog-ng -V syslog-ng 2.0.10
I think I know the issue. poll() indicates that it is possible to write messages to the udp destination, and then send() fails with ENOBUFS.
Last time I've checked the only possible way to handle ENOBUFS in this case is either to ignore it (after all udp is lossy), or retry sending after sleeping some. A similar fix has gone into the "loggen" tool, which generates loads of messages and triggered ENOBUFS similarly.
In loggen, a sleep was added, in syslog-ng, that's less of an option as that might block other things from being done.
The easiest solution is to return EAGAIN in this case, which causes syslog-ng to retry the send() operation in the next poll iteration, which causes a busy loop until more buffers become available.
The patch below in 3.3 makes syslog-ng ignore ENOBUFS. I don't maintain 2.0 anymore, but hey, it was trivial, so here you go:
diff --git a/src/fdwrite.c b/src/fdwrite.c index 3957b7d..f76bea7 100644 --- a/src/fdwrite.c +++ b/src/fdwrite.c @@ -52,6 +52,9 @@ fd_do_write(FDWrite *self, const void *buf, size_t buflen) fsync(self->fd); } while (rc == -1 && errno == EINTR); + + if (rc < 0 && errno == ENOBUFS) + return buflen; return rc; }
And the commit in 3.3:
commit 4d6656bbbc19877e7dc2175a7fb3b9aeeaced270 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sat Mar 31 21:14:28 2012 +0200
logtransport: handle ENOBUFS on output
FreeBSD returns ENOBUFS on send() failure instead of indicating this conditions via poll(). The return of ENOBUFS actually is a send error and is calulated in IP statistics, so the best is to handle it as a success. The only alternative would be to return EAGAIN, which could cause syslog-ng to spin as long as buffer space is unavailable. Since I'm not sure how much time that would take and I think spinning the CPU is not a good idea in general, I just drop the packet in this case. UDP is lossy anyway
Reported-By: Traiano Welcome <Traiano.Welcome@mtnbusiness.co.za> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Thanks very much! I'm currently testing and will feedback after watching for some time! Traiano
-- Bazsi
__________________________________________________________________________ ____ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Hi! Traiano Welcome <Traiano.Welcome@mtnbusiness.co.za> writes:
I've been seeing the following in the messages log of my freebsd syslog server for quite some time now:
--- Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='12', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' Mar 20 12:19:12 syslog2 syslog-ng[35313]: I/O error occurred while writing; fd='13', error='No buffer space available (55)' Mar 20 12:19:12 syslog2 syslog-ng[35313]: Connection broken; time_reopen='60' ---
These happen at a frequency of about 7 per minute on average. See attached trend graphs for an idea of the volume of traffic we're doing, as well as the memory and cpu utilisation trends on this server during this period.
It would be useful to know what fd 12 and 13 are, that might give us a hint or two. Though, I suppose, they're an UDP destination.
My question is: What does this error mean, and how can I resolve it?
With syslog-ng 2.0, I have no idea, I'm afraid. I would strongly suggest upgrading to 3.2 or 3.3. With those two, you can increase various queue and buffer sizes within the config that can perhaps help. The same settings migth be available in 2.0 too, but I haven't looked at that in ages. -- |8]
participants (3)
-
Balazs Scheidler
-
Gergely Nagy
-
Traiano Welcome