On Tue, 2011-10-18 at 23:15 +0200, Jakub Jankowski wrote:
On Fri, 14 Oct 2011 01:39:28 +0200, Jakub Jankowski wrote:
On Thu, 13 Oct 2011 15:32:27 +0200, Balazs Scheidler wrote:
Since upgrading to 3.3.1, there is a strange behaviour of my system: if I reload syslog-ng, it spits out *tons* of similar messages: Internal error, duplicate configuration elements refer to the same persistent config; name='dd_queue(d_mesg,d_mesg#0)'
As far as I can tell, this happens for every destination that was open: $ grep '^Oct 14 00:00:03' /var/log/messages | grep -c 'dd_queue(d_mesg,d_mesg#0)' 186 $ grep '^Oct 14 00:00:03' /var/log/messages | grep -c 'dd_queue(d_cron,d_cron#0)' 164 $ grep '^Oct 14 00:00:03' /var/log/messages | grep -c '_errorlog#0' 62435
I'm pretty sure there are no duplicate configuration elements in my config, especially for those "standard" ones like d_mesg or d_cron:
$ grep d_mesg /etc/syslog-ng/syslog-ng.conf destination d_mesg { file("/var/log/messages" group("root") flush_lines(1)); }; log { source(s_sys); filter(f_default); destination(d_mesg); }; $ grep messages /etc/syslog-ng/syslog-ng.conf destination d_mesg { file("/var/log/messages" group("root") flush_lines(1)); }; $
If there were, I'd get some errors at startup, right? Well, I don't.
But what's most peculiar, is that after such reload, VmRSS stays roughly the same (at least it's not climbing like crazy).
I've also read the complete log and I've found two minor leaks, both of which leak memory when processing SIGHUPs, I've fixed these:
I've built a package with these two patches applied, and run it on my test system, to confirm this weird behaviour above. It still manifests. After a clean start, until there are logs to process, I can reload syslog-ng and there are no "Internal error" messages. As soon as I make my log relays send some messages to this test system, and try to reload it: boom, internal errors again.
This is very weird, I don't recall this behaviour before. I know this description isn't the most precise one, but at this moment I had no time to investigate more. I promise I'll dig some more during next few days. But maybe even this much will help someone debug it.
Ok, at least I managed to get valgrind output from SIGHUP processing. It's avaliable at http://toxcorp.com/stuff/syslog-ng-leak/s3.3.1-head-HUP.log This is for 3.3.1 with recent SIGHUP-leak-related patches applied.
==13436== LEAK SUMMARY: ==13436== definitely lost: 83,795 bytes in 1,698 blocks ==13436== indirectly lost: 1,012 bytes in 21 blocks ==13436== possibly lost: 13,153 bytes in 108 blocks ==13436== still reachable: 80,050 bytes in 3,278 blocks ==13436== suppressed: 0 bytes in 0 blocks
But still I don't get why all of the sudden, "duplicate configuration elements" are logged on SIGHUP, but: - not on start - not if SIGHUP was received prior to any remote message
Hi, Sorry for not answering your emails before, somehow I've missed them previously. Using your problem description I think I've found the issues that caused the leaks for you. The issue is a bit more involved than I like it at this stage (having released a stable version already), but anyway better late than never. The fix is a series of 5 patches (available on "master" branch): commit 73a8c48983319942f668080a19ec037d1aab22cf Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Oct 31 10:06:01 2011 +0100 affile: release per-writer LogQueue instances during runtime Instead of keeping those queues around if a destination is reaped, allow them be freed. Also, make sure that file destinations have their queue restored accross SIGHUP in case they contain unflushed items. Reported-By: Jakub Jankowski <shasta@toxcorp.com> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> commit 040587ec5ec662756ac18eb7ff998af64359c3ba Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Oct 31 10:03:18 2011 +0100 LogWriter: introduce log_writer_get_queue() method The new function returns the current LogQueue instance. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> commit df5cbb37116476bb0b03036f9837e345c8fdb151 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Oct 31 10:02:52 2011 +0100 LogQueue: added keep_on_reload() method This patch introduces a new member function for LogQueue instances. Its role is to indicate whether we prefer keeping that LogQueue instance around when reloading syslog-ng. The default FIFO implementation allows the core to free the queue if it has zero items. If the implementation doesn't supply such method, we default to keeping the queue around. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> commit 9d9679e9e13cecdba33b61d2d3da71869bb5490f Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Oct 31 09:58:42 2011 +0100 LogDestDriver: properly maintain self->queues list in acquire/release The maintenance of the self->queues list was not complete, although new LogQueue instances acquired were added to the queue, their removal only worked in the log_dest_driver_deinit() function. With this change log_dest_driver_release_queue() can also be called explicitly by plugins, and the queues list will not become corrupted. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> commit 779e1e3fb0364b50f2125fd94eb927dda307886a Author: Balazs Scheidler <bazsi@balabit.hu> Date: Mon Oct 31 09:58:18 2011 +0100 driver: don't generate persist IDs for drivers that fail to specify one This seems to have been a bad idea. The only location this was used is the file() destination driver, but that has multiple queues, making the current default ID generation collide. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Without these, file destinations would keep their queues around when reaped via time-reap(). Also the duplicate config elements should be fixed. Any feedback is appreciated. Thanks. -- Bazsi