PostgreSQL Invalid Encoding Errors
Hi All: I'm continually getting into a loop of receiving messages like this in my PostgreSQL log files: 2009-06-19 09:17:36.465 EDT 5846 syslog@syslog: ERROR: invalid byte sequence for encoding "UTF8": 0xa8 2009-06-19 09:17:36.465 EDT 5846 syslog@syslog: HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". 2009-08-01 00:00:05.216 EDT 3986 syslog@syslog: ERROR: invalid byte sequence for encoding "UTF8": 0xc446 2009-08-01 00:00:05.216 EDT 3986 syslog@syslog: HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". I'm using a centralized logging system, and just turned on logging to syslog for our hardware firewall. I'm positive it was this change that is resulting in this behavior, but I really do want to continue pulling this data from the firewall. It appears that random messages from the firewall will come in as invalid UTF8, and somehow get stuck in a loop with syslog-ng continually trying to insert the offending line and PostgreSQL continually refusing. I have several log files that are > 500MB now because of this. Is there a way to force syslog-ng to drop these messages, and perhaps log the error? I am using PostgreSQL 8.3.6. syslog-ng --version syslog-ng 3.0.3 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#08c9bf9d98e4e021756adc12c42605840140ba8b Compile-Date: Jul 8 2009 12:16:03 Enable-Threads: on Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: off Enable-Sun-Door: off Enable-IPv6: off Enable-Spoof-Source: off Enable-TCP-Wrapper: on Enable-SSL: on Enable-SQL: on Enable-Linux-Caps: on Enable-Pcre: on syslog-ng.conf: options { chain_hostnames(off); flush_lines(0); stats_level(2); stats_freq(43200); frac_digits(5); ts_format(iso); }; source src { unix-stream( "/dev/log" max-connections(40) ); internal(); udp(port(514)); tcp(port(5140) keep-alive(yes)); }; destination d_sql { sql( type(pgsql) host("10.233.93.18") username("syslog") password("*****") database("syslog") table("facility_$FACILITY") columns("host", "sourceip", "priority", "lvl", "tag", "rcvd", "sent", "program", "msg") values("$HOST", "$SOURCEIP", "$PRIORITY", "$LEVEL", "$TAG", "$R_ISODATE", "$S_ISODATE", "$PROGRAM", "$MSGONLY") indexes("host", "rcvd", "sent", "program", "msg") ); }; destination console_all { file("/dev/tty12"); }; destination messages { file("/var/log/messages"); }; log { source(src); #destination(messages); destination(console_all); destination(d_sql); };
alter the encoding on the database to latin1, utf8 is the default and it does not understand a lot of characters so instead you will see an error with a hexadecimal representation of the offending character. http://www.postgresql.org/docs/8.0/static/multibyte.html David Blewett wrote:
Hi All:
I'm continually getting into a loop of receiving messages like this in my PostgreSQL log files: 2009-06-19 09:17:36.465 EDT 5846 syslog@syslog: ERROR: invalid byte sequence for encoding "UTF8": 0xa8 2009-06-19 09:17:36.465 EDT 5846 syslog@syslog: HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". 2009-08-01 00:00:05.216 EDT 3986 syslog@syslog: ERROR: invalid byte sequence for encoding "UTF8": 0xc446 2009-08-01 00:00:05.216 EDT 3986 syslog@syslog: HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
I'm using a centralized logging system, and just turned on logging to syslog for our hardware firewall. I'm positive it was this change that is resulting in this behavior, but I really do want to continue pulling this data from the firewall. It appears that random messages from the firewall will come in as invalid UTF8, and somehow get stuck in a loop with syslog-ng continually trying to insert the offending line and PostgreSQL continually refusing. I have several log files that are > 500MB now because of this. Is there a way to force syslog-ng to drop these messages, and perhaps log the error? I am using PostgreSQL 8.3.6.
syslog-ng --version syslog-ng 3.0.3 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#08c9bf9d98e4e021756adc12c42605840140ba8b Compile-Date: Jul 8 2009 12:16:03 Enable-Threads: on Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: off Enable-Sun-Door: off Enable-IPv6: off Enable-Spoof-Source: off Enable-TCP-Wrapper: on Enable-SSL: on Enable-SQL: on Enable-Linux-Caps: on Enable-Pcre: on
syslog-ng.conf: options { chain_hostnames(off); flush_lines(0); stats_level(2); stats_freq(43200); frac_digits(5); ts_format(iso); };
source src { unix-stream( "/dev/log" max-connections(40) ); internal(); udp(port(514)); tcp(port(5140) keep-alive(yes)); }; destination d_sql { sql( type(pgsql) host("10.233.93.18") username("syslog") password("*****") database("syslog") table("facility_$FACILITY") columns("host", "sourceip", "priority", "lvl", "tag", "rcvd", "sent", "program", "msg") values("$HOST", "$SOURCEIP", "$PRIORITY", "$LEVEL", "$TAG", "$R_ISODATE", "$S_ISODATE", "$PROGRAM", "$MSGONLY") indexes("host", "rcvd", "sent", "program", "msg") ); }; destination console_all { file("/dev/tty12"); }; destination messages { file("/var/log/messages"); }; log { source(src); #destination(messages); destination(console_all); destination(d_sql); }; ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
UTF8 can easily handle all that is in Latin1. You just need to set the encoding that the client is connecting with to Latin1. From the link you sent, the table of encodings says this: Server Character Set Available Client Character Sets UTF8 all supported encodings What I'm getting at is that syslog-ng is repeatedly trying to re-insert the same log message that has the incorrectly encoded data after PostgreSQL rejects the data it is expecting in the UTF8 encoding. There should be some sort of feedback loop to prevent infinite retries. If I don't happen to see the messages fly by in the log file, it'll balloon out to 500MB easily. David On Thu, Aug 20, 2009 at 2:01 PM, Paul Robert Marino<prmarino1@gmail.com> wrote:
alter the encoding on the database to latin1, utf8 is the default and it does not understand a lot of characters so instead you will see an error with a hexadecimal representation of the offending character. http://www.postgresql.org/docs/8.0/static/multibyte.html David Blewett wrote:
Hi All:
I'm continually getting into a loop of receiving messages like this in my PostgreSQL log files: 2009-06-19 09:17:36.465 EDT 5846 syslog@syslog: ERROR: invalid byte sequence for encoding "UTF8": 0xa8 2009-06-19 09:17:36.465 EDT 5846 syslog@syslog: HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". 2009-08-01 00:00:05.216 EDT 3986 syslog@syslog: ERROR: invalid byte sequence for encoding "UTF8": 0xc446 2009-08-01 00:00:05.216 EDT 3986 syslog@syslog: HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
I'm using a centralized logging system, and just turned on logging to syslog for our hardware firewall. I'm positive it was this change that is resulting in this behavior, but I really do want to continue pulling this data from the firewall. It appears that random messages from the firewall will come in as invalid UTF8, and somehow get stuck in a loop with syslog-ng continually trying to insert the offending line and PostgreSQL continually refusing. I have several log files that are > 500MB now because of this. Is there a way to force syslog-ng to drop these messages, and perhaps log the error? I am using PostgreSQL 8.3.6.
syslog-ng --version syslog-ng 3.0.3 Revision: ssh+git://bazsi@git.balabit//var/scm/git/syslog-ng/syslog-ng-ose--mainline--3.0#master#08c9bf9d98e4e021756adc12c42605840140ba8b Compile-Date: Jul 8 2009 12:16:03 Enable-Threads: on Enable-Debug: off Enable-GProf: off Enable-Memtrace: off Enable-Sun-STREAMS: off Enable-Sun-Door: off Enable-IPv6: off Enable-Spoof-Source: off Enable-TCP-Wrapper: on Enable-SSL: on Enable-SQL: on Enable-Linux-Caps: on Enable-Pcre: on
syslog-ng.conf: options { chain_hostnames(off); flush_lines(0); stats_level(2); stats_freq(43200); frac_digits(5); ts_format(iso); };
source src { unix-stream( "/dev/log" max-connections(40) ); internal(); udp(port(514)); tcp(port(5140) keep-alive(yes)); }; destination d_sql { sql( type(pgsql) host("10.233.93.18") username("syslog") password("*****") database("syslog") table("facility_$FACILITY") columns("host", "sourceip", "priority", "lvl", "tag", "rcvd", "sent", "program", "msg") values("$HOST", "$SOURCEIP", "$PRIORITY", "$LEVEL", "$TAG", "$R_ISODATE", "$S_ISODATE", "$PROGRAM", "$MSGONLY") indexes("host", "rcvd", "sent", "program", "msg") ); }; destination console_all { file("/dev/tty12"); }; destination messages { file("/var/log/messages"); }; log { source(src); #destination(messages); destination(console_all); destination(d_sql); }; ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Thu, Aug 20, 2009 at 3:21 PM, David Blewett<david@dawninglight.net> wrote:
UTF8 can easily handle all that is in Latin1. You just need to set the encoding that the client is connecting with to Latin1. From the link you sent, the table of encodings says this: Server Character Set Available Client Character Sets UTF8 all supported encodings
Actually, this paste came from here: http://www.postgresql.org/docs/8.3/static/multibyte.html#MULTIBYTE-TRANSLATI... David
participants (2)
-
David Blewett
-
Paul Robert Marino