Re: [syslog-ng] Syslog-ng shipping logs through AMQP with huge memory leaks
I'm resending this letter to the mail list as we hit the message size limit. :) Hello Michal! I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing. As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message. We have several options how to fix this issue and will soon share one. Regards, Gabor
There's a potential fix here: https://github.com/balabit/syslog-ng/pull/1946 -- Bazsi On Wed, Mar 28, 2018 at 10:29 AM, Nagy, Gábor <gabor.nagy@balabit.com> wrote:
I'm resending this letter to the mail list as we hit the message size limit. :)
Hello Michal!
I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing.
As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message.
We have several options how to fix this issue and will soon share one.
Regards, Gabor
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Awesome! Do you want me to test it? I guess I can fetch the PR and deploy the RPM in place of the affected package we have now. On Wed, Mar 28, 2018 at 9:04 AM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
There's a potential fix here: https://github.com/balabit/ syslog-ng/pull/1946
-- Bazsi
On Wed, Mar 28, 2018 at 10:29 AM, Nagy, Gábor <gabor.nagy@balabit.com> wrote:
I'm resending this letter to the mail list as we hit the message size limit. :)
Hello Michal!
I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing.
As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message.
We have several options how to fix this issue and will soon share one.
Regards, Gabor
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
I went ahead and installed syslog-ng from github's master with the changes from the PR. Results are nothing short of impressive. 18503 root 20 0 2859872 43580 5800 S 39.9 0.1 12:33.79 syslog-ng :-) Syslog-ng started with fierce, taking 4.5GB and 460% (!!) of CPU then it settled down and the memory usage started shrinking quickly. We are forwarding 3500 eps with 38% or so of a single CPU and 44MB of RAM. On Wed, Mar 28, 2018 at 11:46 AM, Michal Purzynski <michal@mozilla.com> wrote:
Awesome! Do you want me to test it? I guess I can fetch the PR and deploy the RPM in place of the affected package we have now.
On Wed, Mar 28, 2018 at 9:04 AM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
There's a potential fix here: https://github.com/balabit/sys log-ng/pull/1946
-- Bazsi
On Wed, Mar 28, 2018 at 10:29 AM, Nagy, Gábor <gabor.nagy@balabit.com> wrote:
I'm resending this letter to the mail list as we hit the message size limit. :)
Hello Michal!
I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing.
As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message.
We have several options how to fix this issue and will soon share one.
Regards, Gabor
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
You should thank to Gabor and the Balabit syslog-ng team :) One additional ask if i may: There are a number of allocations in the amqp destinations i don't like, which could be responsible for that high cpu usage. But to be sure can you pls send a fresh perf record output? Would be great to have it with debugging symbols and call graph. Thanks in advance On Mar 28, 2018 16:47, "Michal Purzynski" <michal@mozilla.com> wrote:
I went ahead and installed syslog-ng from github's master with the changes from the PR. Results are nothing short of impressive.
18503 root 20 0 2859872 43580 5800 S 39.9 0.1 12:33.79 syslog-ng
:-)
Syslog-ng started with fierce, taking 4.5GB and 460% (!!) of CPU then it settled down and the memory usage started shrinking quickly.
We are forwarding 3500 eps with 38% or so of a single CPU and 44MB of RAM.
On Wed, Mar 28, 2018 at 11:46 AM, Michal Purzynski <michal@mozilla.com> wrote:
Awesome! Do you want me to test it? I guess I can fetch the PR and deploy the RPM in place of the affected package we have now.
On Wed, Mar 28, 2018 at 9:04 AM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
There's a potential fix here: https://github.com/balabit/sys log-ng/pull/1946
-- Bazsi
On Wed, Mar 28, 2018 at 10:29 AM, Nagy, Gábor <gabor.nagy@balabit.com> wrote:
I'm resending this letter to the mail list as we hit the message size limit. :)
Hello Michal!
I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing.
As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message.
We have several options how to fix this issue and will soon share one.
Regards, Gabor
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/? product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Huge thanks to Gabor and the Balabit syslog-ng team :-) https://drive.google.com/drive/folders/1iJPN7lAzZtvZ9f5g-n8u7pgQ2Rwor5ze perf.data.1 and 2 and 3 were taken at a 99Hz sampling rate during 30-second windows perf.data.4 was taken at a 99Hz sampling rate during a 5-minute window *.svgs are flame graphs Despite having installed glib, libc and kernel symbol packages (a no errors about missing symbols) I still see some 'unknowns'. I can dig further, but even if you see an unknown then another function at the same level or a child function should still tell you who called that. On Wed, Mar 28, 2018 at 2:51 PM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
You should thank to Gabor and the Balabit syslog-ng team :)
One additional ask if i may:
There are a number of allocations in the amqp destinations i don't like, which could be responsible for that high cpu usage. But to be sure can you pls send a fresh perf record output? Would be great to have it with debugging symbols and call graph.
Thanks in advance
On Mar 28, 2018 16:47, "Michal Purzynski" <michal@mozilla.com> wrote:
I went ahead and installed syslog-ng from github's master with the changes from the PR. Results are nothing short of impressive.
18503 root 20 0 2859872 43580 5800 S 39.9 0.1 12:33.79 syslog-ng
:-)
Syslog-ng started with fierce, taking 4.5GB and 460% (!!) of CPU then it settled down and the memory usage started shrinking quickly.
We are forwarding 3500 eps with 38% or so of a single CPU and 44MB of RAM.
On Wed, Mar 28, 2018 at 11:46 AM, Michal Purzynski <michal@mozilla.com> wrote:
Awesome! Do you want me to test it? I guess I can fetch the PR and deploy the RPM in place of the affected package we have now.
On Wed, Mar 28, 2018 at 9:04 AM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
There's a potential fix here: https://github.com/balabit/sys log-ng/pull/1946
-- Bazsi
On Wed, Mar 28, 2018 at 10:29 AM, Nagy, Gábor <gabor.nagy@balabit.com> wrote:
I'm resending this letter to the mail list as we hit the message size limit. :)
Hello Michal!
I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing.
As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message.
We have several options how to fix this issue and will soon share one.
Regards, Gabor
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product= syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Thanks. I will look into it once I get home. Sitting on a plane right now :) On Mar 29, 2018 01:11, "Michal Purzynski" <michal@mozilla.com> wrote:
Huge thanks to Gabor and the Balabit syslog-ng team :-)
https://drive.google.com/drive/folders/1iJPN7lAzZtvZ9f5g-n8u7pgQ2Rwor5ze
perf.data.1 and 2 and 3 were taken at a 99Hz sampling rate during 30-second windows perf.data.4 was taken at a 99Hz sampling rate during a 5-minute window
*.svgs are flame graphs
Despite having installed glib, libc and kernel symbol packages (a no errors about missing symbols) I still see some 'unknowns'. I can dig further, but even if you see an unknown then another function at the same level or a child function should still tell you who called that.
On Wed, Mar 28, 2018 at 2:51 PM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
You should thank to Gabor and the Balabit syslog-ng team :)
One additional ask if i may:
There are a number of allocations in the amqp destinations i don't like, which could be responsible for that high cpu usage. But to be sure can you pls send a fresh perf record output? Would be great to have it with debugging symbols and call graph.
Thanks in advance
On Mar 28, 2018 16:47, "Michal Purzynski" <michal@mozilla.com> wrote:
I went ahead and installed syslog-ng from github's master with the changes from the PR. Results are nothing short of impressive.
18503 root 20 0 2859872 43580 5800 S 39.9 0.1 12:33.79 syslog-ng
:-)
Syslog-ng started with fierce, taking 4.5GB and 460% (!!) of CPU then it settled down and the memory usage started shrinking quickly.
We are forwarding 3500 eps with 38% or so of a single CPU and 44MB of RAM.
On Wed, Mar 28, 2018 at 11:46 AM, Michal Purzynski <michal@mozilla.com> wrote:
Awesome! Do you want me to test it? I guess I can fetch the PR and deploy the RPM in place of the affected package we have now.
On Wed, Mar 28, 2018 at 9:04 AM, Scheidler, Balázs < balazs.scheidler@balabit.com> wrote:
There's a potential fix here: https://github.com/balabit/sys log-ng/pull/1946
-- Bazsi
On Wed, Mar 28, 2018 at 10:29 AM, Nagy, Gábor <gabor.nagy@balabit.com> wrote:
I'm resending this letter to the mail list as we hit the message size limit. :)
Hello Michal!
I have some good news. We have found the reason for the increasing memory usage of syslog-ng in amqp() driver. It is due to our internal memory handling method (aka scratch_buffers) which we use in performance critical paths to prevent the overhead of memory allocations/deallocations of GStrings. It is using memory from a pre-allocated pool instead of allocating/freeing.
As I have described before, the memory usage only increases if the number of queued messages reaches a new maximum. It is not per message and does not happen every for every queued message.
We have several options how to fix this issue and will soon share one.
Regards, Gabor
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
____________________________________________________________ __________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support /documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
participants (3)
-
Michal Purzynski
-
Nagy, Gábor
-
Scheidler, Balázs