syslog-ng 3.2.2 OSE takes 100% CPU after source files get truncated.
Hi, Not sure if syslog-ng has a bug tracker, but I couldn't find one anyway, so posting this here. Here is the problem: I have source file that is a log file that is been truncated periodically (by log rotate). When the truncation happens syslog-ng process starts taking 100% CPU, and not doing any logging. This behavior remains until source file receives a write. To reproduce: 1. configure syslog-ng with following lines: source s_logfiles { file("/var/log/applogs/mysql/mysqld.log" flags(no-parse) program_override("mysqld.log")); file("/var/log/applogs/mysql/mysql-slow.log" flags(no-parse) program_override("mysql-slow.log")); }; destination d_mysql_logall { file("/var/log/mysqlall.log"); }; log { source(s_logfiles); destination(d_mysql_logall);}; 2. start syslog-ng 3. output some text into source files: for i in /var/log/applogs/mysql/mysqld.log /var/log/applogs/mysql/mysql-slow.log ; do echo "some text" >> $i; done 4. truncate source files: for i in /var/log/applogs/mysql/mysqld.log /var/log/applogs/mysql/mysql-slow.log ; do > $i; done 5. observe syslog-ng takes 100% CPU, doing nothing. 6. output some text into source files again: for i in /var/log/applogs/mysql/mysqld.log /var/log/applogs/mysql/mysql-slow.log ; do echo "some text, again" >> $i; done 7. observe syslog-ng returned to normal operation. comment: some times you'll need going through steps 3 - 5 a few times to trigger the bug. My config: syslog-ng 3.2.2 Installer-Version: 3.2.2 Revision: ssh+git://bazsi@git.balabit //var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.2#master#1d3f396485eb47b1ff6aa18ac4f1c4cd51c0ea4c Compile-Date: Jan 20 2011 11:34:35 Enable-Threads: off Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: off Enable-IPv6: on Enable-Spoof-Source: on Enable-TCP-Wrapper: on Enable-SSL: on Enable-SQL: off Enable-Linux-Caps: off Enable-Pcre: on Enable-Pacct: off Regards.
On Wed, 2011-02-02 at 17:29 +0300, Mailing Lists wrote:
Not sure if syslog-ng has a bug tracker, but I couldn't find one anyway, so posting this here.
We have a bugzilla at http://bugzilla.balabit.com/ (although I agree, it could be more visibly advertised that it exists), for future reference.
Here is the problem: I have source file that is a log file that is been truncated periodically (by log rotate). When the truncation happens syslog-ng process starts taking 100% CPU, and not doing any logging. This behavior remains until source file receives a write.
This sounds very, very familiar to me. May I ask what platform this is on (OS, architecture, etc)? -- |8]
This is RHEL 5.5 x86_64 Linux lio-db8 2.6.18-194.11.4.el5xen #1 SMP Fri Sep 17 05:17:00 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48) In the meantime, do you know a version that doesn't have this bug, so I can work with it until youre working on a fix ? Thanks. On Wed, Feb 2, 2011 at 5:42 PM, Gergely Nagy <algernon@balabit.hu> wrote:
On Wed, 2011-02-02 at 17:29 +0300, Mailing Lists wrote:
Not sure if syslog-ng has a bug tracker, but I couldn't find one anyway, so posting this here.
We have a bugzilla at http://bugzilla.balabit.com/ (although I agree, it could be more visibly advertised that it exists), for future reference.
Here is the problem: I have source file that is a log file that is been truncated periodically (by log rotate). When the truncation happens syslog-ng process starts taking 100% CPU, and not doing any logging. This behavior remains until source file receives a write.
This sounds very, very familiar to me.
May I ask what platform this is on (OS, architecture, etc)?
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Wed, 2011-02-02 at 18:44 +0300, Mailing Lists wrote:
This is RHEL 5.5 x86_64
Linux lio-db8 2.6.18-194.11.4.el5xen #1 SMP Fri Sep 17 05:17:00 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48)
In the meantime, do you know a version that doesn't have this bug, so I can work with it until youre working on a fix ?
Can you try the attached patch, if it fixes the issue? I've seen this problem before, but never on Linux, and it was fairly hard to trigger. If it is what I think it is, then the attached patch should fix it. -- |8]
The patch did not solve the problem. Syslog-ng appears to loop over the empty source file. strace -rp 0.000086 stat("/var/log/applogs/mysql/ mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000130 open("/var/log/applogs/mysql/mysqld.log", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 10 0.000082 fcntl(10, F_GETFD) = 0 0.000065 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 0.000063 capget(0x19980330, 0, {CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000076 capset(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000078 capget(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000076 capset(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_FOWNER|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000077 capset(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000083 fstat(10, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000110 lseek(10, 0, SEEK_SET) = 0 0.000069 lseek(10, 10, SEEK_SET) = 10 0.000063 stat("/var/log/applogs/mysql/mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000122 close(14) = 0 0.000093 lseek(10, 0, SEEK_CUR) = 10 0.000090 fstat(10, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000116 capget(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000079 capget(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000074 capset(0x19980330, 0, {CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000076 stat("/var/log/applogs/mysql/mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000135 open("/var/log/applogs/mysql/mysqld.log", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 14 0.000084 fcntl(14, F_GETFD) = 0 0.000062 fcntl(14, F_SETFD, FD_CLOEXEC) = 0 0.000060 capget(0x19980330, 0, {CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000074 capset(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000075 capget(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000075 capset(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_FOWNER|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000075 capset(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000085 fstat(14, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000109 lseek(14, 0, SEEK_SET) = 0 0.000062 lseek(14, 10, SEEK_SET) = 10 0.000060 stat("/var/log/applogs/mysql/mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000122 close(10) = 0 0.000067 lseek(14, 0, SEEK_CUR) = 10 0.000060 fstat(14, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000118 capget(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000123 capget(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000078 capset(0x19980330, 0, {CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000078 stat("/var/log/applogs/mysql/mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000134 open("/var/log/applogs/mysql/mysqld.log", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 10 0.000091 fcntl(10, F_GETFD) = 0 0.000065 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 0.000069 capget(0x19980330, 0, {CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000076 capset(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000078 capget(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000077 capset(0x19980330, 0, {CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_FOWNER|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000077 capset(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000090 fstat(10, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000111 lseek(10, 0, SEEK_SET) = 0 0.000065 lseek(10, 10, SEEK_SET) = 10 0.000064 stat("/var/log/applogs/mysql/mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000127 close(14) = 0 0.000069 lseek(10, 0, SEEK_CUR) = 10 0.000063 fstat(10, {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000110 capget(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000078 capget(0x19980330, 0, {CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000078 capset(0x19980330, 0, {CAP_DAC_OVERRIDE|CAP_SYS_ADMIN, CAP_CHOWN|CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_FOWNER|CAP_NET_BIND_SERVICE|CAP_NET_BROADCAST|CAP_NET_RAW|CAP_SYS_ADMIN, 0}) = 0 0.000077 stat("/var/log/applogs/mysql/mysqld.log", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0 0.000124 open("/var/log/applogs/mysql/mysqld.log", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 14 0.000082 fcntl(14, F_GETFD) = 0 0.000065 fcntl(14, F_SETFD, FD_CLOEXEC) = 0 On Wed, Feb 2, 2011 at 7:10 PM, Gergely Nagy <algernon@balabit.hu> wrote:
On Wed, 2011-02-02 at 18:44 +0300, Mailing Lists wrote:
This is RHEL 5.5 x86_64
Linux lio-db8 2.6.18-194.11.4.el5xen #1 SMP Fri Sep 17 05:17:00 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48)
In the meantime, do you know a version that doesn't have this bug, so I can work with it until youre working on a fix ?
Can you try the attached patch, if it fixes the issue?
I've seen this problem before, but never on Linux, and it was fairly hard to trigger. If it is what I think it is, then the attached patch should fix it.
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Wed, 2011-02-02 at 19:58 +0300, Mailing Lists wrote:
The patch did not solve the problem. Syslog-ng appears to loop over the empty source file.
Mhm. I'll see what I can do. If I can reproduce this at home, then there's a good chance I will be able to provide a patch soon after. I'll keep the list posted. -- |8]
Thanks! Could you tell me if there is a version free from this bug, so I can use it while youre working on the fix ? On Wed, Feb 2, 2011 at 8:10 PM, Gergely Nagy <algernon@balabit.hu> wrote:
On Wed, 2011-02-02 at 19:58 +0300, Mailing Lists wrote:
The patch did not solve the problem. Syslog-ng appears to loop over the empty source file.
Mhm. I'll see what I can do. If I can reproduce this at home, then there's a good chance I will be able to provide a patch soon after.
I'll keep the list posted.
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Wed, 2011-02-02 at 20:21 +0300, Mailing Lists wrote:
Thanks!
Could you tell me if there is a version free from this bug, so I can use it while youre working on the fix ?
Chances are, there isn't such a version. But I can't tell for sure until I figure out what the cause is (but then, when I get that far, I'll have a fix too :P) But I managed to reproduce it, so the fix should be coming your way soon. -- |8]
On Wed, 2011-02-02 at 18:35 +0100, Gergely Nagy wrote:
On Wed, 2011-02-02 at 20:21 +0300, Mailing Lists wrote:
Thanks!
Could you tell me if there is a version free from this bug, so I can use it while youre working on the fix ?
Chances are, there isn't such a version. But I can't tell for sure until I figure out what the cause is (but then, when I get that far, I'll have a fix too :P)
But I managed to reproduce it, so the fix should be coming your way soon.
Update: I found the problem, it exists in all versions of syslog-ng OSE as far as I checked. The good news is, I already have a workaround (with side effects), and have a plan on how to fix the issue properly. I need to leave work now, but I'll have a go at it from home. With a bit of luck, I can present a solution by the end of the day. If not, I'll publish the workaround for the time being. -- |8]
On Wed, 2011-02-02 at 19:58 +0300, Mailing Lists wrote:
The patch did not solve the problem. Syslog-ng appears to loop over the empty source file.
The attached patch fixes the problem, but it may have unwanted side effects - although off the top of my head, I can't think of any, nor did my tests reveal anything. Works For Me(tm) applies. The proper solution will need a bit more thinking, unfortunately. I know why the problem appears, and I know why the attached fix works, but I do not understand the reason why the line I removed was there in a first place. I suppose there was a reason behind it, but I've yet to figure that out. In the meantime, the attached patch should help. I tried my best to find any ill side-effects, but so far, I couldn't. Therefore I think it's pretty safe, but no guarantees. -- |8]
Great! Thanks very much! Your fix works. I tested it with the following script: #!/bin/bash while `true`; do for i in /var/log/applogs/mysql/mysqld.log /var/log/applogs/mysql/mysql-slow.log; do echo "some text" >> $i; done; sync; sleep 1; for i in /var/log/applogs/mysql/mysqld.log /var/log/applogs/mysql/mysql-slow.log; do > $i; done; sync; sleep 1; echo -n "." done The only side-effect I found so far, is that I no longer see the following messages in system log: Feb 2 16:43:58 lio-db8 syslog-ng[8876]: The current log file has a mismatching size/inode information, restarting from the beginning; state='affile_sd_curpos(/var/log/applogs/mysql/mysqld.log)', stored_inode='1753383', cur_file_inode='1753383', stored_size='10', cur_file_size='0' Feb 2 16:43:58 lio--db8 syslog-ng[8876]: The current log file has a mismatching size/inode information, restarting from the beginning; state='affile_sd_curpos(/var/log/applogs/mysql/mysql-slow.log)', stored_inode='1753478', cur_file_inode='1753478', stored_size='114118', cur_file_size='0 But I can live with that. Again, thanks very much for the fix!!! On Wed, Feb 2, 2011 at 11:53 PM, Gergely Nagy <algernon@balabit.hu> wrote:
On Wed, 2011-02-02 at 19:58 +0300, Mailing Lists wrote:
The patch did not solve the problem. Syslog-ng appears to loop over the empty source file.
The attached patch fixes the problem, but it may have unwanted side effects - although off the top of my head, I can't think of any, nor did my tests reveal anything.
Works For Me(tm) applies.
The proper solution will need a bit more thinking, unfortunately. I know why the problem appears, and I know why the attached fix works, but I do not understand the reason why the line I removed was there in a first place.
I suppose there was a reason behind it, but I've yet to figure that out.
In the meantime, the attached patch should help. I tried my best to find any ill side-effects, but so far, I couldn't. Therefore I think it's pretty safe, but no guarantees.
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Hi, Hmm... the primary reason is that the state must be initialized in all cases and affile_sd_recover_state() either restores the old state or initializes a new one. In case the file doesn't exist when syslog-ng starts up, the state cannot be initialized. In these cases, LogReader will send an NC_FILE_MOVED notification once the file becomes available. In these cases the state is initialized by that affile_sd_recover_state() call. So, the problem with removing that may cause ill effects in the case the file becomes available while syslog-ng is already running. Why does it cause a spin in this usecase? On Wed, 2011-02-02 at 21:53 +0100, Gergely Nagy wrote:
On Wed, 2011-02-02 at 19:58 +0300, Mailing Lists wrote:
The patch did not solve the problem. Syslog-ng appears to loop over the empty source file.
The attached patch fixes the problem, but it may have unwanted side effects - although off the top of my head, I can't think of any, nor did my tests reveal anything.
Works For Me(tm) applies.
The proper solution will need a bit more thinking, unfortunately. I know why the problem appears, and I know why the attached fix works, but I do not understand the reason why the line I removed was there in a first place.
I suppose there was a reason behind it, but I've yet to figure that out.
In the meantime, the attached patch should help. I tried my best to find any ill side-effects, but so far, I couldn't. Therefore I think it's pretty safe, but no guarantees.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
-- Bazsi
Why does it cause a spin in this usecase?
It goes into a spin, because even if the file was reopened, _recover_state() will end up calling log_proto_buffered_server_apply_state() which will lseek to an offset past EOF. I'm not entirely sure why it will do that, however - there's two scenarios that can happen, and I'm not sure which one we end up in. I'll test it as soon as possible. Anyway, being past EOF will trigger NC_FILE_MOVED again, and we start the whole process again, and again, and again. Come to think of it, the problem is probably an inconsistency in state, but that's just my gut feeling at the moment. I'll have a closer look at this later today. -- |8]
On Sun, 2011-02-06 at 11:24 +0100, Gergely Nagy wrote:
Why does it cause a spin in this usecase?
It goes into a spin, because even if the file was reopened, _recover_state() will end up calling log_proto_buffered_server_apply_state() which will lseek to an offset past EOF.
I'm not entirely sure why it will do that, however - there's two scenarios that can happen, and I'm not sure which one we end up in. I'll test it as soon as possible.
Anyway, being past EOF will trigger NC_FILE_MOVED again, and we start the whole process again, and again, and again.
Come to think of it, the problem is probably an inconsistency in state, but that's just my gut feeling at the moment.
I'll have a closer look at this later today.
hmm.. I remember fixing something like this. Here it is: Author: Balazs Scheidler <bazsi@balabit.hu> 2010-11-07 15:03:27 Committer: Balazs Scheidler <bazsi@balabit.hu> 2010-12-10 16:14:09 Parent: ee1d0d7f0720c3ded0d2bfaf22ca6e42d669b7a6 (separate directory hierarchy into lib/, modules/ and syslog-ng/) Child: 26b70812cae04179c939dc8b1f323a24692a30f5 (Performance improvement: file write uses writev instead of per-message write() to write larger chunks) Branches: many (24) Follows: v3.2.1 Precedes: LogProto: apply_state shouldn't allow file offsets over the end-of-file Assume that the file was truncated in this case. It is probably present in 3.3 only. Can you check if backporting that patch fixes the spinning too? -- Bazsi
On Sun, 2011-02-06 at 12:09 +0100, Balazs Scheidler wrote:
It is probably present in 3.3 only. Can you check if backporting that patch fixes the spinning too?
That's interesting, because the problem was reported for 3.2.2, and I saw the spinning with syslog-ng-3.2 git head too. -- |8]
On Sun, 2011-02-06 at 12:19 +0100, Gergely Nagy wrote:
On Sun, 2011-02-06 at 12:09 +0100, Balazs Scheidler wrote:
It is probably present in 3.3 only. Can you check if backporting that patch fixes the spinning too?
That's interesting, because the problem was reported for 3.2.2, and I saw the spinning with syslog-ng-3.2 git head too.
I mean the fix was present in 3.3 only. -- Bazsi
On Sun, 2011-02-06 at 12:30 +0100, Balazs Scheidler wrote:
On Sun, 2011-02-06 at 12:19 +0100, Gergely Nagy wrote:
On Sun, 2011-02-06 at 12:09 +0100, Balazs Scheidler wrote:
It is probably present in 3.3 only. Can you check if backporting that patch fixes the spinning too?
That's interesting, because the problem was reported for 3.2.2, and I saw the spinning with syslog-ng-3.2 git head too.
I mean the fix was present in 3.3 only.
Ah, right. My bad, sorry - I'll backport & test in a few hours then. -- |8]
On Sun, 2011-02-06 at 12:09 +0100, Balazs Scheidler wrote:
Author: Balazs Scheidler <bazsi@balabit.hu> 2010-11-07 15:03:27 Committer: Balazs Scheidler <bazsi@balabit.hu> 2010-12-10 16:14:09 Parent: ee1d0d7f0720c3ded0d2bfaf22ca6e42d669b7a6 (separate directory hierarchy into lib/, modules/ and syslog-ng/) Child: 26b70812cae04179c939dc8b1f323a24692a30f5 (Performance improvement: file write uses writev instead of per-message write() to write larger chunks) Branches: many (24) Follows: v3.2.1 Precedes:
LogProto: apply_state shouldn't allow file offsets over the end-of-file
Assume that the file was truncated in this case.
It is probably present in 3.3 only. Can you check if backporting that patch fixes the spinning too?
Checked, 3.3 does not spin, nor does 3.2 with the above patch applied. Backport for 3.2 is available from the integration/apply_state-fix branch of my syslog-ng-3.2 tree at: git://git.madhouse-project.org/syslog-ng/syslog-ng-3.2.git -- |8]
pulled this patch, thanks Gergely. On Sun, 2011-02-06 at 20:32 +0100, Gergely Nagy wrote:
On Sun, 2011-02-06 at 12:09 +0100, Balazs Scheidler wrote:
Author: Balazs Scheidler <bazsi@balabit.hu> 2010-11-07 15:03:27 Committer: Balazs Scheidler <bazsi@balabit.hu> 2010-12-10 16:14:09 Parent: ee1d0d7f0720c3ded0d2bfaf22ca6e42d669b7a6 (separate directory hierarchy into lib/, modules/ and syslog-ng/) Child: 26b70812cae04179c939dc8b1f323a24692a30f5 (Performance improvement: file write uses writev instead of per-message write() to write larger chunks) Branches: many (24) Follows: v3.2.1 Precedes:
LogProto: apply_state shouldn't allow file offsets over the end-of-file
Assume that the file was truncated in this case.
It is probably present in 3.3 only. Can you check if backporting that patch fixes the spinning too?
Checked, 3.3 does not spin, nor does 3.2 with the above patch applied. Backport for 3.2 is available from the integration/apply_state-fix branch of my syslog-ng-3.2 tree at:
git://git.madhouse-project.org/syslog-ng/syslog-ng-3.2.git
-- Bazsi
participants (3)
-
Balazs Scheidler
-
Gergely Nagy
-
Mailing Lists