[syslog-ng] Performance issue with python code in template

Balazs Scheidler bazsi77 at gmail.com
Wed Nov 4 06:20:59 UTC 2020


Hi,

I think the biggest bottleneck would be that Python as a language is using
a GIL, a global interpreter lock, which means that only one Python
evaluation happens at a time, regardless of how many threads we are using.

With that said, I think it is better to have that single thread on the
destination side (e.g. template function) as at the very least you wouldn't
affect the source threads with this limitation, but this still means that
you'd be using 1 thread for output, and 1 thread for input.

To increase parallelism you could use the so-reuseport() option for the udp
source and use multiple of them (this is quite a recent feature), with that
you can use multiple threads for reception that each would feed the same
destination queue. This will only help if you single thread output can cope
with the traffic, and depending on the flow-control setting, this might
mean that

1) syslog-ng will actively drop messages (flow control disabled,
destination queue full)

2) syslog-ng will suspend the source threads, causing the kernel to drop in
the socket queue (flow control enabled)

To improve performance further (or at all), you would need to improve the
Python code or you would have to switch to using syslog-ng primitives to
achieve the same thing.

You can expect 20-25k messages/sec from Python code on decent hardware,
assuming you are not doing much.

As an alternative more hacky solution you could run multiple instances of
your Python code using a program () destination and loop the results back
to syslog-ng somehow (e.g. a file source or a Unix socket). This way the
Python GIL wouldn't be a problem.

Hope this helps,
Bazsi

On Tue, Nov 3, 2020, 18:29 Gabor Nagy (gnagy) <Gabor.Nagy at oneidentity.com>
wrote:

> Hi Diego,
>
> I have only some experience with python performance in syslog-ng, but I
> don't think you could significantly improve performance.
> let me think about this:
> You have a non-scalable udp source (1 thread), and also a non-scalable
> file destination (it writes one file in one thread).
> If syslog-ng is in threaded mode (by default yes, unless the global option
> threaded(no) is not set), sources and destinations run in different threads.
> Parsers run in the same thread as the sources.
> With the python parser, the python code would limit the source thread's
> performance, while a template function would be invoked in the
> destination's thread.
> Based on this, a template function should be better.
>
> Also, please note syslog-ng always processes one message at a time (except
> when you do correlation with dbparser() or grouping-by()).
>
> These are just my early thoughts, I'll think about this and write you an
> update if I found out anything.
>
> Regards,
> Gabor
>
> ------------------------------
> *From:* syslog-ng <syslog-ng-bounces at lists.balabit.hu> on behalf of Diego
> Billi <diego.billi at labs.it>
> *Sent:* Monday, November 2, 2020 19:11
> *To:* syslog-ng at lists.balabit.hu <syslog-ng at lists.balabit.hu>
> *Subject:* [syslog-ng] Performance issue with python code in template
>
> CAUTION: This email originated from outside of the organization. Do not
> follow guidance, click links, or open attachments unless you recognize the
> sender and know the content is safe.
>
>
> Hi,
> I have a performance problem. This is a skeleton of my syslog-ng
> configuration.
>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>
> python {
>
> import ...mylibs...
>
> def t_my_python_function(msg):
>     ...
>     output_data = .... process msg object ...
>     ...
>     return output_data
>
> }
>
> source s_mysource {
>         udp( ....  );
> }
>
> destination d_mydestination {
>         file(
>             "/tmp/mylogs.log"
>
>             template("$(python t_my_python_function)")
>         );
> };
>
>
> log {
>         source(s_mysource);
>
>         destination(d_mydestination);
>
>         flags(flow-control);
> };
>
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
>
> I receive syslog messages via UDP with a very very high rate.
>
> Incoming messages are processed with a template written in python.
>
> Syslog-ng have performance issues with this flow due to the python code.
>
> I'm wondering how to speed up this solution.
>
> Note that the python function can be parallelized (no shared state/data
> between messages)
>
>
> I'm trying this solution but i don't know if it changes that much.
>
> ------------------------------------------------------------------------
>
> python {
>
> import ...mylibs...
>
> def p_my_python_function(msg):
>     ...
>     ...
>     output_data = .... process msg object ...
>     ...
>     ...
>     msg['MY_OUTPUT_DATA'] =  output_data
> }
>
> source s_mysource {
>         ...
> }
>
> destination d_mydestination {
>         file(
>             "/tmp/mylogs.log"
>             template("${MY_OUTPUT_DATA}")
>         );
> };
>
>
> log {
>         source(s_mysource);
>
>         parser(p_my_python_function);   <---- moved here (outside
> destination)
>
>         destination(d_mydestination);
>
>         flags(flow-control);
> };
>
> ------------------------------------------------------------------------
>
> Moving the "processing" outside the "destination" is really useful?
>
> I'm trying to understand if i can use threading and multi-core supporto of
> syslog-ng.
>
> Thank you for your time.
>
>
> Diego.
>
> ______________________________________________________________________________
> Member info:
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.balabit.hu%2Fmailman%2Flistinfo%2Fsyslog-ng&data=04%7C01%7Cgabor.nagy%40oneidentity.com%7C4da75df2b2d94833acde08d87f5ab0a1%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637399374802446586%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=6Pcy7KDBUtLd%2Fp1E8nev0jaZ0pmZ1M0NPe%2BmMXfzz4M%3D&reserved=0
> Documentation:
> https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.balabit.com%2Fsupport%2Fdocumentation%2F%3Fproduct%3Dsyslog-ng&data=04%7C01%7Cgabor.nagy%40oneidentity.com%7C4da75df2b2d94833acde08d87f5ab0a1%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637399374802446586%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=lIOC7oIMczH2A2vW%2FTdhEcSusIg5NIRCzzcYcyjew5I%3D&reserved=0
> FAQ:
> https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.balabit.com%2Fwiki%2Fsyslog-ng-faq&data=04%7C01%7Cgabor.nagy%40oneidentity.com%7C4da75df2b2d94833acde08d87f5ab0a1%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637399374802446586%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1lkIPNno0DJLF5irsBtPMi7L07fMYVEcLOb6Fw1LxzM%3D&reserved=0
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20201104/61e8b667/attachment-0001.html>


More information about the syslog-ng mailing list