Hi,

we use syslog-ng on one of our nodes as client and rsyslogd as server.

We have configured tcp to connect to server.

What we see is when the reboot the complete cluster (has node where the server runs and other node where the client runs,) only the server node is rebooted. The client node where syslog-ng runs is not rebooted.

The issue is after we reboot the syslog-ng connection is not re-established and hence we are not getting any logs on the server.

After going through some of the posts related to the problem and the admin guide, it mentions about time_reopen() value to solve this issue.

In my case, i have configured time_reopen(40). So it means at 40s interval the tcp connection is tired to be re-established. But i see that this is not happening.

 

Can you please help me resolve this issue. Also can you let me know how many times and at what interval this connection re-establishment is tried.

 

Below are some of the logs that could be helpful.

 

syslog-ng.conf:

cat /etc/syslog-ng/syslog-ng.conf

# syslog-ng configuration file.

#

# This should behave pretty much like the original syslog on RedHat. But

# it could be configured a lot smarter.

#

# See syslog-ng(8) and syslog-ng.conf(5) for more information.

#

# 20000925 gb@sysfive.com

#

# Updated by Frank Crawford (<Frank.Crawford@ac3.com.au>) - 10 Aug 2002

#       - for Red Hat 7.3

#       - totally do away with klogd

#       - add message "kernel:" as is done with klogd.

#

# Updated by Frank Crawford (<Frank.Crawford@ac3.com.au>) - 22 Aug 2002

#       - use the log_prefix option as per Balazs Scheidler's email

#

# Updated by Jose Pedro Oliveira (<jpo at di.uminho.pt>) - 05 Apr 2003

#       - corrected filters 'f_filter2' and 'f_filter6'

#     these filters were only allowing messages of one specific

#     priority level; they should be allowing messages from that

#     priority and upper levels.

#

# Updated by Jose Pedro Oliveira (<jpo at di.uminho.pt>) - 25 Jan 2005

#   - Don't sync the d_mail destination

#

# Updated by Jose Pedro Oliveira (<jpo at di.uminho.pt>) - 01 Feb 2005

#   - /proc/kmsg is a file not a pipe.

#     (https://lists.balabit.hu/pipermail/syslog-ng/2005-February/006963.html)

#

 

options {

    sync (0);

    time_reopen (40);

    log_fifo_size (1000);

    stats(86400);

    long_hostnames (off);

    use_dns (no);

    use_fqdn (no);

    create_dirs (no);

    keep_hostname (yes);

};

 

source s_sys {

    file ("/proc/kmsg" log_prefix("kernel: "));

    unix-stream ("/dev/log");

    internal();

    # udp(ip(0.0.0.0) port(514));

};

 

destination d_cons { file("/dev/console"); };

#destination d_mesg { file("/var/log/messages"); };

#destination d_auth { file("/var/log/secure"); };

destination d_mail { file("/var/log/maillog" sync(10)); };

destination d_spol { file("/var/log/spooler"); };

destination d_boot { file("/var/log/boot.log"); };

destination d_cron { file("/var/log/cron"); };

destination d_mlal { usertty("*"); };

destination tcp-to-master   { tcp("169.254.1.82" localip("169.254.1.66") localport(601) port(601) ); };

 

#filter f_filter1   { facility(kern); };

filter f_filter2   { level(info..emerg) and

                     not facility(mail,authpriv,cron) and

                     not match("Connection broken to AF_INET") and

                     not match("Error connecting to remote host AF_INET"); };

filter f_filter3   { facility(authpriv); };

filter f_filter4   { facility(mail); };

filter f_filter5   { level(emerg); };

filter f_filter6   { facility(uucp) or

                     (facility(news) and level(crit..emerg)); };

filter f_filter7   { facility(local7); };

filter f_filter8   { facility(cron); };

 

log { source(s_sys); filter(f_filter2); destination(tcp-to-master); };

#log { source(s_sys); filter(f_filter1); destination(d_cons); };

# Redirecting the logs to /var/log/ of CLA, to avoid log files filling up AHUB3-A file system

#log { source(s_sys); filter(f_filter2); destination(d_mesg); };

#log { source(s_sys); filter(f_filter3); destination(d_auth); };

log { source(s_sys); filter(f_filter4); destination(d_mail); };

log { source(s_sys); filter(f_filter5); destination(d_mlal); };

log { source(s_sys); filter(f_filter6); destination(d_spol); };

log { source(s_sys); filter(f_filter7); destination(d_boot); };

log { source(s_sys); filter(f_filter8); destination(d_cron); };

 

netstat output when the cluster was rebooted:

Transition from ESTABLISHED to CLOSED_WAIT

tcp 0 0 169.254.1.67:601 169.254.1.82:601 ESTABLISHED 3771/syslog-ng

10.34.37.082224000

tcp 1 0 169.254.1.67:601 169.254.1.82:601 CLOSE_WAIT 3771/syslog-ng

 

CLOSED_WAIT to LAST_ACK

tcp 1 0 169.254.1.67:601 169.254.1.82:601 CLOSE_WAIT 3771/syslog-ng

10.34.43.555017000

tcp 1 1 169.254.1.67:601 169.254.1.82:601 LAST_ACK -

10.34.43.644520000

 

LAST_ACK to CLOSED:

tcp 1 1 169.254.1.67:601 169.254.1.82:601 LAST_ACK -

10.36.29.754533000

tcp 1 1 169.254.1.67:601 169.254.1.82:601 LAST_ACK -

10.36.29.801733000

tcp 1 1 169.254.1.67:601 169.254.1.82:601 LAST_ACK -

10.36.29.846228000

 

10.36.29.892427000

 

10.36.29.931181000

 

SYN_SENT:

10.36.43.192808000

tcp 0 1 169.254.1.67:601 169.254.1.82:601 SYN_SENT 3771/syslog-ng

10.36.43.238811000

tcp 0 1 169.254.1.67:601 169.254.1.82:601 SYN_SENT 3771/syslog-ng

10.36.43.279561000

tcp 0 1 169.254.1.67:601 169.254.1.82:601 SYN_SENT 3771/syslog-ng

continues…

10.36.46.180840000

tcp 0 1 169.254.1.67:601 169.254.1.82:601 SYN_SENT 3771/syslog-ng

10.36.46.225891000

 

10.36.46.275090000

 

10.36.46.318485000

 

10.36.46.359169000

 

 

The server node comes up and it start rsyslogd after some more time. But when this starts, there is no syn packets received by rsyslogd and hence the connection is not established.

 

Can you please let me know where exactly 40s(configured time_reopen value) coming into picture here in the netstat output?

 

syslog-ng version:

syslog-ng 1.6.12


-----------------------------------------------------------------------------------------------------------------------------------------------------

I got a reply from one of the balabit guys that i have to use version 3.4.

I am currently trying to use it.

But i still have some questions below.

1. How many number of times the connection is retried.

2. when exactly the connection retry happens (when will it get to know when it has to retry)

3. time_reopen(40), what exactly is 40seconds here.




--
Regards,
Prasad