Hi, I've just been through a couple of threads on this, one from a couple of years back and one from last year. I am having the same issue with logs being dropped. Obviously to maintain the integrity of my audit trail I'd like to limit/stop this. Most of the stats are 0 (which are printed every 12hours). Here is the entire config which is exactly the same on all syslog clients: options { chain_hostnames(off); sync(0); stats(43200); }; source src { unix-stream("/dev/log"); internal(); pipe("/proc/kmsg"); }; destination messages { file("/var/log/messages"); }; destination net { tcp("x.x.x.x" port(xxx)); }; log { source(src); destination(messages); }; log { source(src); destination(net); }; Most of the time the dropped stats are 0 but sometimes they are 3000 or so and occasionally 20000-50000! Usually this happens on the mail server or the log server which are both heavily used machines. One of the destinations is obviously backing up, likely the tcp destination for the mail server. On the logserver I already have log_fifo_size(20000); to try to alleviate this. I would have thought this was reasonable. I don't think this increases memory consumption too much either, at least not by today's standards, the server has the nominal 1GB of Ram. Should I increase this on the server and add it to on the agents too? I think this would allow a large buffer to hopefully stop this altogether (unless of course the thing goes over the 20000 or whatever). Perhaps I should set log_fifo_size to 50000? Any recommendations or feedback on this? Thanks -- Hari Sekhon
options { chain_hostnames(off); sync(0); stats(43200); }; source src { unix-stream("/dev/log"); internal(); pipe("/proc/kmsg"); }; destination messages { file("/var/log/messages"); }; destination net { tcp("x.x.x.x" port(xxx)); }; log { source(src); destination(messages); }; log { source(src); destination(net); };
Most of the time the dropped stats are 0 but sometimes they are 3000 or so and occasionally 20000-50000! Usually this happens on the mail server or the log server which are both heavily used machines.
One of the destinations is obviously backing up, likely the tcp destination for the mail server. On the logserver I already have log_fifo_size(20000); to try to alleviate this. I would have thought this was reasonable. I don't think this increases memory consumption too much either, at least not by today's standards, the server has the nominal 1GB of Ram.
Should I increase this on the server and add it to on the agents too?
you should look at adding the log_fifo_size() on the agents. the way I read it above, your log sources don't have an increased log_fifo_size(). hrm. to make that more clear: you should increase the log_fifo_size() on the systems where you are seeing the drops.
From what I remember the memory for this is dynamically allocated, and cleaned up afterwards..so I would even set it at 100,000 or 200,000.
Mike
yes this is what I was thinking but the server also dropped some. I will try to increase the fifo by the amount of dropped stats on the server, make it 40000 instead of 20000 and then on the agents give them a fifo of 30000 just to be safe. From what I've measure it doesn't seem to take much ram so this should be ok. Although this wasn't done at the times of heavier load when the machine's fifo is filling up. If anybody knows more on this or has some info on the mem consumption related to doing this then please let me know Thanks Hari Sekhon Mike wrote:
options { chain_hostnames(off); sync(0); stats(43200); }; source src { unix-stream("/dev/log"); internal(); pipe("/proc/kmsg"); }; destination messages { file("/var/log/messages"); }; destination net { tcp("x.x.x.x" port(xxx)); }; log { source(src); destination(messages); }; log { source(src); destination(net); };
Most of the time the dropped stats are 0 but sometimes they are 3000 or so and occasionally 20000-50000! Usually this happens on the mail server or the log server which are both heavily used machines.
One of the destinations is obviously backing up, likely the tcp destination for the mail server. On the logserver I already have log_fifo_size(20000); to try to alleviate this. I would have thought this was reasonable. I don't think this increases memory consumption too much either, at least not by today's standards, the server has the nominal 1GB of Ram.
Should I increase this on the server and add it to on the agents too?
you should look at adding the log_fifo_size() on the agents. the way I read it above, your log sources don't have an increased log_fifo_size().
hrm. to make that more clear: you should increase the log_fifo_size() on the systems where you are seeing the drops.
From what I remember the memory for this is dynamically allocated, and cleaned up afterwards..so I would even set it at 100,000 or 200,000.
Mike _______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Thu, 2007-02-01 at 18:24 +0000, Hari Sekhon wrote:
yes this is what I was thinking but the server also dropped some. I will try to increase the fifo by the amount of dropped stats on the server, make it 40000 instead of 20000 and then on the agents give them a fifo of 30000 just to be safe. From what I've measure it doesn't seem to take much ram so this should be ok. Although this wasn't done at the times of heavier load when the machine's fifo is filling up.
If anybody knows more on this or has some info on the mem consumption related to doing this then please let me know
We have investigated a similar issue in-house, and as it seems the performance improvements in 1.6.10 and .11 and 2.0.0 increase the load on the destination buffers significantly, which in turn can cause message loss in cases where it did not occur before. For example, previously file destinations could only drop messages, if there were at least 100 log connections, producing a log message in a single poll iteration. (assuming log_fifo_size(100), the default). Later versions have a much higher probability of internal message drops. This can be solved with increasing log_fifo_size(), the out-of-the-box defaults seem to be unadequate right now. The problem is: * message sources read maximum 30 messages at a single poll iteration * each socket connection on /dev/log is a message source in this sense (e.g. if there are 10 programs connecting to /dev/log, then a single input iteration can produce as much as 10*30 = 300 messages) * these messages are put into the destination's fifo, with a default size of 100 messages. Previously each source generated at most 1 message per poll cycle, now it is increased to 30. It is trivial to see that the default log_fifo_size() value is not adequate. I'm thinking about increasing the default log_fifo_size to 1000 in 1.6.x syslog-ng 2.0 is also affected, albeit a bit differently: * the number of messages fetched in a single go can be controlled by a configuration option (per source fetch_limit() option) * its default value is 10 instead of 30 I too think, that fifo size would be forced to a minimum value of 1000. -- Bazsi
But I'm talking much higher numbers, I have had a log_fifo_size of 20000 and it still dropped several thousand messages. I have just taken the shotgun approach to the math and increased the log_fifo_size on the log server by the max number of dropped messages in the stats log. So now I've got a log_fifo_size of 50000 on my logserver and added 30000 to each agent syslog-ng. I will see if I have any dropped stats in a few days to see how it's going... -h Hari Sekhon Balazs Scheidler wrote:
On Thu, 2007-02-01 at 18:24 +0000, Hari Sekhon wrote:
yes this is what I was thinking but the server also dropped some. I will try to increase the fifo by the amount of dropped stats on the server, make it 40000 instead of 20000 and then on the agents give them a fifo of 30000 just to be safe. From what I've measure it doesn't seem to take much ram so this should be ok. Although this wasn't done at the times of heavier load when the machine's fifo is filling up.
If anybody knows more on this or has some info on the mem consumption related to doing this then please let me know
We have investigated a similar issue in-house, and as it seems the performance improvements in 1.6.10 and .11 and 2.0.0 increase the load on the destination buffers significantly, which in turn can cause message loss in cases where it did not occur before.
For example, previously file destinations could only drop messages, if there were at least 100 log connections, producing a log message in a single poll iteration. (assuming log_fifo_size(100), the default).
Later versions have a much higher probability of internal message drops. This can be solved with increasing log_fifo_size(), the out-of-the-box defaults seem to be unadequate right now.
The problem is: * message sources read maximum 30 messages at a single poll iteration * each socket connection on /dev/log is a message source in this sense (e.g. if there are 10 programs connecting to /dev/log, then a single input iteration can produce as much as 10*30 = 300 messages) * these messages are put into the destination's fifo, with a default size of 100 messages.
Previously each source generated at most 1 message per poll cycle, now it is increased to 30. It is trivial to see that the default log_fifo_size() value is not adequate.
I'm thinking about increasing the default log_fifo_size to 1000 in 1.6.x
syslog-ng 2.0 is also affected, albeit a bit differently: * the number of messages fetched in a single go can be controlled by a configuration option (per source fetch_limit() option) * its default value is 10 instead of 30
I too think, that fifo size would be forced to a minimum value of 1000.
participants (3)
-
Balazs Scheidler
-
Hari Sekhon
-
Mike