Fwd: Multithreaded inputs with same output file : is it safe ?
Hello, I have a few questions related to multithreading.
From the doc, my understanding is that inputs are multithreaded this way :
1/ Tcp connections are sent to different thread when multithreading is enabled, even if there is a single source enabled in config with a single tcp port listening 2/ UDP datagrams are NOT sent to different thread if there is a single source even if there are multiple listening ports in the source config 3/ UDP datagrams are sent to different threads if there are different sources in config (1 source -> 1 thread), so basically to benefit from multithreading with UDP one needs to configure several sources. Is my understanding correct ? If we have several UDP source with some round robin balancing of the input to different source on the same daemon process, that means that several host with the same name can send their syslog data to different source. For the output, I am using files with the name of the sending host in the name. Is there an issue if several source (= from different threads) ends up being outputed in the same file. Another way to turn the question is: are all the data from the source thread then send to some common thread so that output to a file is always handled by the same thread even if log is received from different source threads ? I tried to ask the same question with a schematic : Case1 : +-------------------------+ +-------------------+ +-----------------------------+ +-----------+ +----------+ | | | | | | | host foo +------->+ |X----->+ thread1 some foo log +----->+ +--->+ thread handling foo.log +------>foo.log +-----------+ | iptables |X XX| some bar log | | black box magic | | | | round | XXXX +-------------------------+ | | +-----------------------------+ | robin | XXXXX | | | |XX XX+-------------------------+ | | +-----------------------------+ +-----------+ | X------>+ | | | | | | host bar +------->+ | | thread2 some foo log +----->+ +--->+ thread handling bar.log +------>bar.log +-----------+ +----------+ | some bar log | | | | | +-------------------------+ +-------------------+ +-----------------------------+ Case2 : +-------------------------+ +-----------+ +----------+ | | | host foo +------->+ |X----->+ thread1 some foo log +---------------------> +--------+ +-----------+ | iptables |X XX| some bar log +----------| |foo.log | | round | XXXX +-------------------------+ +------------> +--------+ 2 different thread writting | robin | XXXXX | | to same file | |XX XX+-------------------------+ | +---------->----------+ likely bad ! +-----------+ | X------>+ +--------+ |bar.log | | host bar +------->+ | | thread2 some foo log +--------------------->----------+ +-----------+ +----------+ | some bar log | +-------------------------+ One example config : # Notice that host name is used in output file name # There is no afinity between 1 host <> 1 udp port => logs from host foo are send to every source/udp port in a round robin fashion source s_syslog__udp5141{ network(ip(192.168.1.1) transport("udp") port(5141) flags("threaded")); }; source s_syslog__udp5142{ network(ip(192.168.1.1) transport("udp") port(5142) flags("threaded")); }; source s_syslog__udp5143{ network(ip(192.168.1.1) transport("udp") port(5143) flags("threaded")); }; source s_syslog__udp5144{ network(ip(192.168.1.1) transport("udp") port(5144) flags("threaded")); }; destination d_network_fw_misc { file("`syslog_log_root_path`/fw/fw/$FULLHOST/`naming_scheme`" group("logs_fw") dir-group("logs_fw") log-fifo-size(500000)); }; destination d_network_wifi { file("`syslog_log_root_path`/wifi/$FULLHOST/`naming_scheme`" group("logs_wifi") dir-group("logs_wifi") log-fifo-size(500000)); }; destination d_network_debug { file("`syslog_log_root_path`/debug/`naming_scheme`" group("logs_network_debug") dir-group("logs_network_debug") log-fifo-size(600000)); }; log { source(s_syslog__udp5141); source(s_syslog__udp5142); source(s_syslog__udp5143); source(s_syslog__udp5144); # Every channel of the junction received a copy of the message, unless there is a final flag before # Final flags are applied only within the junction and not globally junction { channel {filter(f_debug); destination(d_network_debug);}; channel {filter(f_network_wifi); destination(d_network_wifi); flags(final);}; channel {destination(d_network_fw_misc); flags(final);}; }; }; #Round robin with iptables : iptables -t nat -A PREROUTING -d 192.168.1.1 -i bond0 -p udp -m udp --dport 514 -m statistic --mode nth --every 4 --packet 0 -j REDIRECT --to-port 5141 iptables -t nat -A PREROUTING -d 192.168.1.1 -i bond0 -p udp -m udp --dport 514 -m statistic --mode nth --every 3 --packet 0 -j REDIRECT --to-port 5142 iptables -t nat -A PREROUTING -d 192.168.1.1 -i bond0 -p udp -m udp --dport 514 -m statistic --mode nth --every 2 --packet 0 -j REDIRECT --to-port 5143 iptables -t nat -A PREROUTING -d 192.168.1.1 -i bond0 -p udp -m udp --dport 514 -m statistic --mode nth --every 1 --packet 0 -j REDIRECT --to-port 5144 Thanks for your help -- Jean-Baptiste Fuzier
Hi,
1/ Tcp connections are sent to different thread when multithreading is enabled, even if there is a single source enabled in config with a single tcp port listening
That is correct. A single source object in the configuration can handle at most `max-connections()` number of connections, the default is 10. This does not mean that 10 long-running threads are created, syslog-ng is non-blocking and uses a thread pool to schedule I/O jobs. Nevertheless, TCP connections within 1 source objects are "multithreaded" and scales between cores.
2/ UDP datagrams are NOT sent to different thread if there is a single source even if there are multiple listening ports in the source config
Yes. Every UDP source object in the configuration can be scheduled to the I/O work pool separately making it possible to receive messages on UDP faster if you can load-balance your messages to 2 or more separate UDP port. The datagrams of a UDP source are processed sequentially, meaning a single UDP source object can't scale between cores.
3/ UDP datagrams are sent to different threads if there are different sources in config (1 source -> 1 thread), so basically to benefit from multithreading with UDP one needs to configure several sources.
Correct. 1 UDP source is not bound to a specific thread, but yes, a single UDP source won't scale to more than 1 thread, we can't talk about parallelism in this case. Please note that using the same UDP source object in multiple log paths still counts as one source. Besides the mentioned possibilities, you have another options: UDP sources have a config option called `so-reuseport()`. I'm quoting the syslog-ng Admin Guide: Enables SO_REUSEPORT on systems that support it. When enabled, the kernel allows multiple UDP sockets to be bound to the same port, and the kernel load-balances incoming UDP datagrams to the sockets. The sockets are distributed based on the hash of (srcip, dstip, srcport, dstport), so the same listener should be receiving packets from the same endpoint. For example: source { udp(so-reuseport(1) port(2000) persist-name("udp1")); udp(so-reuseport(1) port(2000) persist-name("udp2")); udp(so-reuseport(1) port(2000) persist-name("udp3")); udp(so-reuseport(1) port(2000) persist-name("udp4")); };
Is there an issue if several source (= from different threads) ends up being outputed in the same file.
This is not a problem, our destinations are decoupled from their sources through a memory or disk-based queue. There is no threading issue that user should care about.
Another way to turn the question is: are all the data from the source thread then send to some common thread so that output to a file is always handled by the same thread even if log is received from different source threads ?
Something like that. Non-blocking destination jobs are scheduled to the I/O pool, they run in parallel with their sources. "Threaded destinations" such as HTTP, Redis, Python, Java, etc. have a dedicated thread, but it is not important in this context. Synchronization between sources and destinations is done through memory queues (or optionally: disk-based queues), so you have the freedom to create any log pipeline/path you want. -- László Várady
participants (2)
-
Jean-Baptiste Fuzier
-
László Várady (lvarady)