[syslog-ng] 3.7.1 elasticsearch destination bug when dealing with unicode encoding.

Evan Rempel erempel at uvic.ca
Fri Oct 2 20:18:40 CEST 2015


I think I havecome across a bug in the elasticsearch destination where log lines with UTF8 characters result in a shortend message length attribute which results in a slightly truncated json object being sent to elasticsearch.


and here is the source syslog line at our syslog server. This is where the json object is created.

2015-10-02T10:22:47-07:00 local at sandtiger.comp.uvic.ca/sandtiger.comp.uvic.ca mail.warning mimedefang.pl[10880]: t92HMkGW028396: Allowing attachment named OutlookEmoji-<U+1F633>.png, ext=.png, type=image/png, RELAY=mail-bn1on0131.outbound.protection.outlook.com [157.56.110.131], FROM=<Holly.Richardson at Dal.Ca>, TO=<cobyt at uvic.ca>

Here is the json object as logged to a file destination on the same host that is rujnning the elasticsearch destination. This is just looging $MESSAGE since the payload is already JSON.

{"flare":{"profile":"DCS"},"cfgmgrrole":"INFRA","cfgmgrosFull":"Redhat 5_64","cfgmgros":"unix","cfgmgrmodel":"ESX 5","cfgmgrlocation":"ESX-PROD","cfgmgrenvironment":"Prod","cfgmgrassetType":"Virtual Server","SOURCEHOST":"sandtiger.comp.uvic.ca","SHORTHOST":"sandtiger","PROGRAM":"mimedefang.pl","PRIORITY":"warning","PID":"10880","PATTERNID":"377","MESSAGE":"t92HMkGW028396: Allowing attachment named OutlookEmoji-<U+1F633>.png, ext=.png, type=image/png, 
RELAY=mail-bn1on0131.outbound.protection.outlook.com [157.56.110.131], FROM=<Holly.Richardson at Dal.Ca>, TO=<cobyt at uvic.ca>","ISODATE":"2015-10-02T10:22:47-07:00","HOST":"sandtiger.comp.uvic.ca","FACILITY":"mail"}

This is the same conent that is sent to the elasticsearch destination --  option("message-template", "$MESSAGE\n")

and here is the failed message from the elasticsearch server

[2015-10-02 10:22:48,630][DEBUG][action.bulk              ] [sponge] [flare-2015.10.02.17][2] failed to execute bulk item (index) index {[flare-2015.10.02.17][test][AVApk-CyhIyyHCO_k_bc], source[{"flare":{"profile":"DCS"},"cfgmgrrole":"INFRA","cfgmgrosFull":"Redhat 5_64","cfgmgros":"unix","cfgmgrmodel":"ESX 5","cfgmgrlocation":"ESX-PROD","cfgmgrenvironment":"Prod","cfgmg
rassetType":"Virtual Server","SOURCEHOST":"sandtiger.comp.uvic.ca","SHORTHOST":"sandtiger","PROGRAM":"mimedefang.pl","PRIORITY":"warning","PID":"10880","PATTERNID":"377","MESSAGE":"t92HMkGW028396: Allowing attachment named OutlookEmoji-ð<U+009F><U+0098>³.png, ext=.png, type=image/png, RELAY=mail-bn1on0131.outbound.protection.outlook.com [157.56.110.131], FROM=<Holly.Rich
ardson at Dal.Ca>, TO=<cobyt at uvic.ca>","ISODATE":"2015-10-02T10:22:47-07:00","HOST":"sandtiger.comp.uvic.ca","FACILITY":"mail]}



Note that the source has unicde data as <U+1F633>
The elasticsearch destination is sent <U+1F633>
but the elastisearch server logs ð<U+009F><U+0098>³

The elasticsearch server also seems to end the message with the text

  "FACILITY":"mail

when it should end with

"FACILITY":"mail"}

so it is missing two characters.

Does anyone want to guess at what is happening?

Should I post to the elasticsearch group with the reasoning that the source (syslog-ng) and the destination (elasticsearch) need to be configured with the same unicode settings?

Thanks,

-- 
Evan Rempel                                      erempel at uvic.ca
Senior Systems Administrator                        250.721.7691
Data Centre Services, University Systems, University of Victoria



More information about the syslog-ng mailing list