Using Debian Sarge I set up a configuration where some 160 machines log by TCP to a single central server. When the machines boot (all at the same time) they obviously put quite some load on the server, which results in lines like
Don't boot all the machines and log to a server at the same time unless you are really well-equipped network wise. It's the same congestion problem you have when running a data center and try to power up the nodes after a power failure: you risk another power failure.
Oct 6 20:55:18 bigyo syslog-ng[24969]: STATS: dropped 1303
What's the peak load message-wise and network-wise? How's your network topology? Are the clients in one collision domain or geographically distributed?
after the client connected messages. Also there is a constant periodic loss (the clients run synchronised, so cron jobs fire simultaneously) amounting to
Add a random delay in your cronjobs before starting the action. Since you have perfectly identified the source of the problem, fix it there. There is no requirement to synchronise cronjobs over a party of machines; and the logfiles can by synchronised by using the timestamps.
Oct 7 06:35:27 bigyo syslog-ng[24969]: STATS: dropped 9
Is there a way to overcome this?
Fix the root of the problem. Of course we could assist you in addressing the problem by tuning the server, if the former suggestions are not appropriate.
In average the log traffic is fairly low, but huge bursts do happen as described above.
Did you identify other bursts besides the reboot- and cronjob-related ones?
Setting log_fifo_size on the server didn't help much; it logs straight onto disk:
Others have given you ideas on how to tune the server side.
[stock Debian Sarge part distributing local logs elided] options { keep_hostname (yes); }; source s_cl { tcp (max_connections (255)); }; destination d_cl { file ("/var/log/cluster/$HOST" template ("$DATE $MSG\n") group ("adm") perm (0640) create_dirs (yes) dir_perm (750)); }; log { source (s_cl); destination (d_cl); };
You could add flags(final) to speed up the parsing a bit; provided you have more log statements.
The clients are configured like this (full file): options { use_dns (no); }; source s_all { internal (); unix-stream ("/dev/log"); file ("/proc/kmsg" log_prefix ("kernel: ")); }; destination bigyo { tcp ("bigyo"); }; log { source (s_all); destination (bigyo); };
Looks fine. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc