feature request (parallel processing)
So, after months of work, we finally turned on our production environment for syslog collection. However, we hit one immediate snag. Currently were writing to the database, and the way the database works is that it collects enough data to fill a single block, and then it flushes out that block. Well every time it goes to flush the block out, the insert takes an extra couple milliseconds. Now when I'm doing about 220000 inserts a second, that millisecond delay is significant. So basically syslog has to pause on that log statement while it waits for the database to flush. (1 out of 10 messages was getting dropped) Now I tried to solve this by writing multiple destination drivers so that a second database thread could be processing while the first was flushing, but that didnt work as it appears syslog waits for the destination driver to complete before it hands data off to the second driver. Instead I managed to solve the problem by creating yet more syslog processes. So basically the master process listens for data from all the hosts. It then runs a match on the $PID and sends all even numbered PIDs to one syslog process, and all odd numbered PIDs to a second syslog process. This way both processes can be inserting to the database at the same time. It effectively cuts the amount of work each database thread does in half, so that when it has to pause to flush, it doesnt cause the syslog buffer to fill up. Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
What backend database were you getting a single box to do 220k inserts/sec sustained? The fastest I've ever seen is a little over 100k/sec with LOAD DATA INFILE in MySQL, though I haven't used particularly beefy boxes. If your tablespace is RAM based, I guess I could believe that, but that's a lot of RAM to allocate to long-term log storage. In my setups, I write to files out to disk (via a Perl program) and then do an import of the data file, which is the fastest method I've seen so far. What method are you using? On Thu, Sep 2, 2010 at 7:01 PM, <syslogng@feystorm.net> wrote:
So, after months of work, we finally turned on our production environment for syslog collection. However, we hit one immediate snag. Currently were writing to the database, and the way the database works is that it collects enough data to fill a single block, and then it flushes out that block. Well every time it goes to flush the block out, the insert takes an extra couple milliseconds. Now when I'm doing about 220000 inserts a second, that millisecond delay is significant. So basically syslog has to pause on that log statement while it waits for the database to flush. (1 out of 10 messages was getting dropped)
Now I tried to solve this by writing multiple destination drivers so that a second database thread could be processing while the first was flushing, but that didnt work as it appears syslog waits for the destination driver to complete before it hands data off to the second driver.
Instead I managed to solve the problem by creating yet more syslog processes. So basically the master process listens for data from all the hosts. It then runs a match on the $PID and sends all even numbered PIDs to one syslog process, and all odd numbered PIDs to a second syslog process. This way both processes can be inserting to the database at the same time. It effectively cuts the amount of work each database thread does in half, so that when it has to pause to flush, it doesnt cause the syslog buffer to fill up.
Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Ditto, I'd really like to know how you're getting that rate, please share :-) ______________________________________________________________ Clayton Dukes ______________________________________________________________ On Thu, Sep 2, 2010 at 8:22 PM, Martin Holste <mcholste@gmail.com> wrote:
What backend database were you getting a single box to do 220k inserts/sec sustained? The fastest I've ever seen is a little over 100k/sec with LOAD DATA INFILE in MySQL, though I haven't used particularly beefy boxes. If your tablespace is RAM based, I guess I could believe that, but that's a lot of RAM to allocate to long-term log storage.
In my setups, I write to files out to disk (via a Perl program) and then do an import of the data file, which is the fastest method I've seen so far. What method are you using?
On Thu, Sep 2, 2010 at 7:01 PM, <syslogng@feystorm.net> wrote:
So, after months of work, we finally turned on our production environment for syslog collection. However, we hit one immediate snag. Currently were writing to the database, and the way the database works is that it collects enough data to fill a single block, and then it flushes out that block. Well every time it goes to flush the block out, the insert takes an extra couple milliseconds. Now when I'm doing about 220000 inserts a second, that millisecond delay is significant. So basically syslog has to pause on that log statement while it waits for the database to flush. (1 out of 10 messages was getting dropped)
Now I tried to solve this by writing multiple destination drivers so that a second database thread could be processing while the first was flushing, but that didnt work as it appears syslog waits for the destination driver to complete before it hands data off to the second driver.
Instead I managed to solve the problem by creating yet more syslog processes. So basically the master process listens for data from all the hosts. It then runs a match on the $PID and sends all even numbered PIDs to one syslog process, and all odd numbered PIDs to a second syslog process. This way both processes can be inserting to the database at the same time. It effectively cuts the amount of work each database thread does in half, so that when it has to pause to flush, it doesnt cause the syslog buffer to fill up.
Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Enterprise level equipment. Oracle 11g on a HP dl360 backed by an EMC SAN array over fiber channel. Sent: Thursday, September 02, 2010 7:00:41 PM From: Clayton Dukes <cdukes@gmail.com> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] feature request (parallel processing)
Ditto, I'd really like to know how you're getting that rate, please share :-)
______________________________________________________________
Clayton Dukes ______________________________________________________________
On Thu, Sep 2, 2010 at 8:22 PM, Martin Holste <mcholste@gmail.com <mailto:mcholste@gmail.com>> wrote:
What backend database were you getting a single box to do 220k inserts/sec sustained? The fastest I've ever seen is a little over 100k/sec with LOAD DATA INFILE in MySQL, though I haven't used particularly beefy boxes. If your tablespace is RAM based, I guess I could believe that, but that's a lot of RAM to allocate to long-term log storage.
In my setups, I write to files out to disk (via a Perl program) and then do an import of the data file, which is the fastest method I've seen so far. What method are you using?
On Thu, Sep 2, 2010 at 7:01 PM, <syslogng@feystorm.net <mailto:syslogng@feystorm.net>> wrote: > So, after months of work, we finally turned on our production environment > for syslog collection. However, we hit one immediate snag. Currently were > writing to the database, and the way the database works is that it collects > enough data to fill a single block, and then it flushes out that block. Well > every time it goes to flush the block out, the insert takes an extra couple > milliseconds. Now when I'm doing about 220000 inserts a second, that > millisecond delay is significant. So basically syslog has to pause on that > log statement while it waits for the database to flush. (1 out of 10 > messages was getting dropped) > > Now I tried to solve this by writing multiple destination drivers so that a > second database thread could be processing while the first was flushing, but > that didnt work as it appears syslog waits for the destination driver to > complete before it hands data off to the second driver. > > Instead I managed to solve the problem by creating yet more syslog > processes. So basically the master process listens for data from all the > hosts. It then runs a match on the $PID and sends all even numbered PIDs to > one syslog process, and all odd numbered PIDs to a second syslog process. > This way both processes can be inserting to the database at the same time. > It effectively cuts the amount of work each database thread does in half, so > that when it has to pause to flush, it doesnt cause the syslog buffer to > fill up. > > Ultimately my request is this, allow multiple destination drivers to work at > the same time. I realize this is probably not a simple change, but seems > like it would be a significant speed enhancement. > > > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: > http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
------------------------------------------------------------------------
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
I'm on IBM Bladecenter (granted an older 4-core 2.2GHz) with EMC SAN on MySQL 5.1 with a large bulk_insert_buffer and bulk loading is only around 100k/sec, which is about what you get maximum of writing raw text to a filehandle (because that's pretty much what it's doing). You're saying Oracle 11g on one mid-range server will do a sustained 220k/sec for hours at a time? On Thu, Sep 2, 2010 at 8:04 PM, <syslogng@feystorm.net> wrote:
Enterprise level equipment. Oracle 11g on a HP dl360 backed by an EMC SAN array over fiber channel.
Sent: Thursday, September 02, 2010 7:00:41 PM From: Clayton Dukes <cdukes@gmail.com> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] feature request (parallel processing)
Ditto, I'd really like to know how you're getting that rate, please share :-) ______________________________________________________________
Clayton Dukes ______________________________________________________________
On Thu, Sep 2, 2010 at 8:22 PM, Martin Holste <mcholste@gmail.com> wrote:
What backend database were you getting a single box to do 220k inserts/sec sustained? The fastest I've ever seen is a little over 100k/sec with LOAD DATA INFILE in MySQL, though I haven't used particularly beefy boxes. If your tablespace is RAM based, I guess I could believe that, but that's a lot of RAM to allocate to long-term log storage.
In my setups, I write to files out to disk (via a Perl program) and then do an import of the data file, which is the fastest method I've seen so far. What method are you using?
On Thu, Sep 2, 2010 at 7:01 PM, <syslogng@feystorm.net> wrote:
So, after months of work, we finally turned on our production environment for syslog collection. However, we hit one immediate snag. Currently were writing to the database, and the way the database works is that it collects enough data to fill a single block, and then it flushes out that block. Well every time it goes to flush the block out, the insert takes an extra couple milliseconds. Now when I'm doing about 220000 inserts a second, that millisecond delay is significant. So basically syslog has to pause on that log statement while it waits for the database to flush. (1 out of 10 messages was getting dropped)
Now I tried to solve this by writing multiple destination drivers so that a second database thread could be processing while the first was flushing, but that didnt work as it appears syslog waits for the destination driver to complete before it hands data off to the second driver.
Instead I managed to solve the problem by creating yet more syslog processes. So basically the master process listens for data from all the hosts. It then runs a match on the $PID and sends all even numbered PIDs to one syslog process, and all odd numbered PIDs to a second syslog process. This way both processes can be inserting to the database at the same time. It effectively cuts the amount of work each database thread does in half, so that when it has to pause to flush, it doesnt cause the syslog buffer to fill up.
Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
________________________________ ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
No, we dont usually do that for hours at a time. Our normal rate is probably half that, but we'll frequently burst up to that for periods of 5-20 min (were an email provider, these are incoming & outgoing email logs). Sent: Thursday, September 02, 2010 7:21:07 PM From: Martin Holste <mcholste@gmail.com> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] feature request (parallel processing)
I'm on IBM Bladecenter (granted an older 4-core 2.2GHz) with EMC SAN on MySQL 5.1 with a large bulk_insert_buffer and bulk loading is only around 100k/sec, which is about what you get maximum of writing raw text to a filehandle (because that's pretty much what it's doing). You're saying Oracle 11g on one mid-range server will do a sustained 220k/sec for hours at a time?
On Thu, Sep 2, 2010 at 8:04 PM, <syslogng@feystorm.net> wrote:
Enterprise level equipment. Oracle 11g on a HP dl360 backed by an EMC SAN array over fiber channel.
Sent: Thursday, September 02, 2010 7:00:41 PM From: Clayton Dukes <cdukes@gmail.com> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] feature request (parallel processing)
Ditto, I'd really like to know how you're getting that rate, please share :-) ______________________________________________________________
Clayton Dukes ______________________________________________________________
On Thu, Sep 2, 2010 at 8:22 PM, Martin Holste <mcholste@gmail.com> wrote:
What backend database were you getting a single box to do 220k inserts/sec sustained? The fastest I've ever seen is a little over 100k/sec with LOAD DATA INFILE in MySQL, though I haven't used particularly beefy boxes. If your tablespace is RAM based, I guess I could believe that, but that's a lot of RAM to allocate to long-term log storage.
In my setups, I write to files out to disk (via a Perl program) and then do an import of the data file, which is the fastest method I've seen so far. What method are you using?
On Thu, Sep 2, 2010 at 7:01 PM, <syslogng@feystorm.net> wrote:
So, after months of work, we finally turned on our production environment for syslog collection. However, we hit one immediate snag. Currently were writing to the database, and the way the database works is that it collects enough data to fill a single block, and then it flushes out that block. Well every time it goes to flush the block out, the insert takes an extra couple milliseconds. Now when I'm doing about 220000 inserts a second, that millisecond delay is significant. So basically syslog has to pause on that log statement while it waits for the database to flush. (1 out of 10 messages was getting dropped)
Now I tried to solve this by writing multiple destination drivers so that a second database thread could be processing while the first was flushing, but that didnt work as it appears syslog waits for the destination driver to complete before it hands data off to the second driver.
Instead I managed to solve the problem by creating yet more syslog processes. So basically the master process listens for data from all the hosts. It then runs a match on the $PID and sends all even numbered PIDs to one syslog process, and all odd numbered PIDs to a second syslog process. This way both processes can be inserting to the database at the same time. It effectively cuts the amount of work each database thread does in half, so that when it has to pause to flush, it doesnt cause the syslog buffer to fill up.
Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
________________________________ ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Thu, 2010-09-02 at 18:01 -0600, syslogng@feystorm.net wrote:
So, after months of work, we finally turned on our production environment for syslog collection. However, we hit one immediate snag. Currently were writing to the database, and the way the database works is that it collects enough data to fill a single block, and then it flushes out that block. Well every time it goes to flush the block out, the insert takes an extra couple milliseconds. Now when I'm doing about 220000 inserts a second, that millisecond delay is significant. So basically syslog has to pause on that log statement while it waits for the database to flush. (1 out of 10 messages was getting dropped)
Now I tried to solve this by writing multiple destination drivers so that a second database thread could be processing while the first was flushing, but that didnt work as it appears syslog waits for the destination driver to complete before it hands data off to the second driver.
Instead I managed to solve the problem by creating yet more syslog processes. So basically the master process listens for data from all the hosts. It then runs a match on the $PID and sends all even numbered PIDs to one syslog process, and all odd numbered PIDs to a second syslog process. This way both processes can be inserting to the database at the same time. It effectively cuts the amount of work each database thread does in half, so that when it has to pause to flush, it doesnt cause the syslog buffer to fill up.
Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
All SQL destinations were running in the same thread, that's right. But this was changed in 3.2, where each SQL destination gets a dedicated thread. At least this is what you are after, right? -- Bazsi
Sent: Friday, September 03, 2010 5:15:45 AM From: Balazs Scheidler <bazsi@balabit.hu> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] feature request (parallel processing)
On Thu, 2010-09-02 at 18:01 -0600, syslogng@feystorm.net wrote:
...
Ultimately my request is this, allow multiple destination drivers to work at the same time. I realize this is probably not a simple change, but seems like it would be a significant speed enhancement.
All SQL destinations were running in the same thread, that's right. But this was changed in 3.2, where each SQL destination gets a dedicated thread.
At least this is what you are after, right?
Yup. Sounds like we'll be switching to 3.2 as soon as its available.
participants (4)
-
Balazs Scheidler
-
Clayton Dukes
-
Martin Holste
-
syslogng@feystorm.net