[syslog-ng] 3.7.1 elasticsearch destination bug when dealing with unicode encoding.

Juhász, Viktor viktor.juhasz at balabit.com
Mon Oct 5 11:02:31 CEST 2015


Hi,

Hmmm, it looks that something went wrong while creating java string from c
string (calling jni NewStringUTF)
This looks like a bug. I will do the root cause analyse.


BR,
Viktor



On Sat, Oct 3, 2015 at 8:36 AM, Scheidler, Balázs <
balazs.scheidler at balabit.com> wrote:

> Hi,
>
> Do i understand correctly that you added <U+1F633> in place of utf8
> sequences in the email and the file contains utf8 encoding of the same
> value?
>
> My theory right now is that elastic uses a 16bit representation of unicode
> codepoints,  and 1f633 doesnt fit there. But I couldnt come up with
> plausible explanation how it would become ð<U+009F><U+0098>³
>
> Syslog-ng uses utf8 internally, so it should work with long utf8 sequences
> without problems. Do you perhaps have an encoding() option at the elastic
> destination?
>
> It could also be a problem in the elastic java plugin, I dont know how we
> supply the data. @juhaszviktor do you see any chance of this happening in
> the java code?
> On Oct 2, 2015 20:19, "Evan Rempel" <erempel at uvic.ca> wrote:
>
>> I think I havecome across a bug in the elasticsearch destination where
>> log lines with UTF8 characters result in a shortend message length
>> attribute which results in a slightly truncated json object being sent to
>> elasticsearch.
>>
>>
>> and here is the source syslog line at our syslog server. This is where
>> the json object is created.
>>
>> 2015-10-02T10:22:47-07:00
>> local at sandtiger.comp.uvic.ca/sandtiger.comp.uvic.ca mail.warning
>> mimedefang.pl[10880]: t92HMkGW028396: Allowing attachment named
>> OutlookEmoji-<U+1F633>.png, ext=.png, type=image/png, RELAY=
>> mail-bn1on0131.outbound.protection.outlook.com [157.56.110.131],
>> FROM=<Holly.Richardson at Dal.Ca>, TO=<cobyt at uvic.ca>
>>
>> Here is the json object as logged to a file destination on the same host
>> that is rujnning the elasticsearch destination. This is just looging
>> $MESSAGE since the payload is already JSON.
>>
>> {"flare":{"profile":"DCS"},"cfgmgrrole":"INFRA","cfgmgrosFull":"Redhat
>> 5_64","cfgmgros":"unix","cfgmgrmodel":"ESX
>> 5","cfgmgrlocation":"ESX-PROD","cfgmgrenvironment":"Prod","cfgmgrassetType":"Virtual
>> Server","SOURCEHOST":"sandtiger.comp.uvic.ca
>> ","SHORTHOST":"sandtiger","PROGRAM":"mimedefang.pl","PRIORITY":"warning","PID":"10880","PATTERNID":"377","MESSAGE":"t92HMkGW028396:
>> Allowing attachment named OutlookEmoji-<U+1F633>.png, ext=.png,
>> type=image/png,
>> RELAY=mail-bn1on0131.outbound.protection.outlook.com [157.56.110.131],
>> FROM=<Holly.Richardson at Dal.Ca>, TO=<cobyt at uvic.ca
>> >","ISODATE":"2015-10-02T10:22:47-07:00","HOST":"sandtiger.comp.uvic.ca
>> ","FACILITY":"mail"}
>>
>> This is the same conent that is sent to the elasticsearch destination --
>> option("message-template", "$MESSAGE\n")
>>
>> and here is the failed message from the elasticsearch server
>>
>> [2015-10-02 10:22:48,630][DEBUG][action.bulk              ] [sponge]
>> [flare-2015.10.02.17][2] failed to execute bulk item (index) index
>> {[flare-2015.10.02.17][test][AVApk-CyhIyyHCO_k_bc],
>> source[{"flare":{"profile":"DCS"},"cfgmgrrole":"INFRA","cfgmgrosFull":"Redhat
>> 5_64","cfgmgros":"unix","cfgmgrmodel":"ESX
>> 5","cfgmgrlocation":"ESX-PROD","cfgmgrenvironment":"Prod","cfgmg
>> rassetType":"Virtual Server","SOURCEHOST":"sandtiger.comp.uvic.ca
>> ","SHORTHOST":"sandtiger","PROGRAM":"mimedefang.pl","PRIORITY":"warning","PID":"10880","PATTERNID":"377","MESSAGE":"t92HMkGW028396:
>> Allowing attachment named OutlookEmoji-ð<U+009F><U+0098>³.png, ext=.png,
>> type=image/png, RELAY=mail-bn1on0131.outbound.protection.outlook.com
>> [157.56.110.131], FROM=<Holly.Rich
>> ardson at Dal.Ca>, TO=<cobyt at uvic.ca
>> >","ISODATE":"2015-10-02T10:22:47-07:00","HOST":"sandtiger.comp.uvic.ca
>> ","FACILITY":"mail]}
>>
>>
>>
>> Note that the source has unicde data as <U+1F633>
>> The elasticsearch destination is sent <U+1F633>
>> but the elastisearch server logs ð<U+009F><U+0098>³
>>
>> The elasticsearch server also seems to end the message with the text
>>
>>   "FACILITY":"mail
>>
>> when it should end with
>>
>> "FACILITY":"mail"}
>>
>> so it is missing two characters.
>>
>> Does anyone want to guess at what is happening?
>>
>> Should I post to the elasticsearch group with the reasoning that the
>> source (syslog-ng) and the destination (elasticsearch) need to be
>> configured with the same unicode settings?
>>
>> Thanks,
>>
>> --
>> Evan Rempel                                      erempel at uvic.ca
>> Senior Systems Administrator                        250.721.7691
>> Data Centre Services, University Systems, University of Victoria
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation:
>> http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>
>>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation:
> http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.balabit.hu/pipermail/syslog-ng/attachments/20151005/944dbb9f/attachment.htm 


More information about the syslog-ng mailing list