[syslog-ng] Hitting g_assert when using sanitize-utf8 enabled!

James Elstone james at elstone.net
Fri Jan 13 14:57:13 UTC 2017


Hi Balázs,

Good news.

The problem has been solved simply by moving to the latest version of the FreeBSD ports tree and recompiling.  It seems the CP1252 to UTF-8 conversion is occurring correctly now.

For completeness the following notable packages were updated:

Title              From       To
glib               2.46.2      2.46.2_2
gmake         4.1.2        4.2.1_1
libinconv     1.14_9     1.14_10
pcre              8.38         8.39_1
syslo-ng37  3.7.3        3.7.3_2

I am suspecting that the issue was in the supporting tool chain rather than Syslog-ng it's self.

Thank you for your help; knowing it worked from your tests was very reassuring and pointed us in the right direction!

Kind regards,

James

On 12 January 2017 17:51:33 GMT+00:00, "Scheidler, Balázs" <balazs.scheidler at balabit.com> wrote:
>I have tried a number of combinations to cause aborts, but without
>success
>so far. I can imagine that my change may fix your issue, but I couldn't
>reproduce a case where we would cross the closing NUL byte in the
>input.
>
>I did my testing both via end-to-end tests (e.g. sending the above
>message
>to syslog-ng with sanitize-utf8 enabled) and via a unit test program.
>
>Neither caused the failure. So a longer excerpt from your input would
>be
>very much welcome, maybe in private.
>
>Bazsi
>
>-- 
>Bazsi
>
>On Tue, Jan 10, 2017 at 2:25 PM, James Elstone <james at elstone.net>
>wrote:
>
>> Will give this a twirl shortly.
>>
>> James
>>
>>
>> On 10 January 2017 10:14:09 GMT+00:00, "Scheidler, Balázs" <
>> balazs.scheidler at balabit.com> wrote:
>>>
>>> Does this fix it?
>>>
>>> diff --git a/lib/utf8utils.c b/lib/utf8utils.c
>>> index 2b84bdc..c76ffc1 100644
>>> --- a/lib/utf8utils.c
>>> +++ b/lib/utf8utils.c
>>> @@ -114,7 +114,7 @@ _append_unsafe_utf8_as_escaped(GString
>>> *escaped_output, const gchar *raw,
>>>        _append_escaped_utf8_character(escaped_output, &raw, -1,
>>> unsafe_chars,
>>>                                       control_format,
>invalid_format);
>>>    else
>>> -    while (raw_len)
>>> +    while (raw_len > 0)
>>>        raw_len -= _append_escaped_utf8_character(escaped_output,
>&raw,
>>> raw_len, unsafe_chars,
>>>                   control_format, invalid_format);
>>>  }
>>>
>>>
>>> --
>>> Bazsi
>>>
>>> On Tue, Jan 10, 2017 at 11:12 AM, Scheidler, Balázs <
>>> balazs.scheidler at balabit.com> wrote:
>>>
>>>> Hmm, thanks for the analysis so far. Is the 0x92 value followed by
>a
>>>> zero byte? It seems that for some reason the utf8 escaping
>functions skip
>>>> that.
>>>>
>>>> On Jan 9, 2017 9:52 PM, "James Elstone" <james at elstone.net> wrote:
>>>>
>>>>> Hi Attila,
>>>>>
>>>>> The syslog message being sent is with utf8_sanitise enabled on the
>udp
>>>>> transport:
>>>>>
>>>>> <38>Jan 7 20:10:11 hostname-01
>microsoft-windows-security-auditing[success]
>>>>> 4648 A logon was attempted by that account at s credentials.
>>>>>
>>>>> Where @ is byte hex value of 0x92, which is a valid graphical
>>>>> apostrophe in Windows-1252 character set, but in UTF-8 any char
>with a byte
>>>>> value of between 127 to 159 decimal are control characters. I have
>>>>> truncated the actual log message for brevity here. There has to be
>syslog
>>>>> load before and after this message is received to see the issue.
>>>>>
>>>>> Specifically when reading in UTF-8, (g-string is native UTF-8)
>byte
>>>>> 0x92 looks for a corresponding 0x9c and ignores null terminations
>in
>>>>> between... (See Wikipedia' C0 C1 Utf-8 page for a little historic
>>>>> information).
>>>>>
>>>>> Looking at the contents of the <src> variable (in 3.7.3 code), it
>>>>> contains multiple syslog messages in syslog-format.c, and strlen
>of <src>
>>>>> does not equal <left> prior to the procedure call into
>utf8utils.c. The
>>>>> message received on the wire is about 850 bytes long, <src> is
>about 8000
>>>>> bytes when going into utf8utils.c and about 15 bytes in the
>reassigned ptr
>>>>> variable of the g-string, hence the assert being triggered.
>>>>>
>>>>> Going to move to 3.8.1 as there has been a bit of work in this
>area
>>>>> since 3.7.3 and will retest tomorrow.
>>>>>
>>>>> Is there anyway to control the character set the inbound message
>is
>>>>> parsed against; we only want a UTF-8 compliant stream being
>outputted by
>>>>> syslog-ng?
>>>>>
>>>>> Alternatively is there a way to filter this char out on an
>upstream
>>>>> syslog-ng instance please (it is passing through an identical
>instance
>>>>> without utf8_sanitise enabled on it without problem)?
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> James
>>>>>
>>>>> Kr,
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>>
>>>>> On 7 January 2017 19:44:15 GMT+00:00, "Szalai, Attila" <
>>>>> Attila.Szalai at morganstanley.com> wrote:
>>>>>>
>>>>>> I’ve checked the glib source too (in version 2.50, but I do not
>think
>>>>>>  it changed too much between the two version) and have no idea
>how this
>>>>>> could happen.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, an example line is definitively needed to find the root
>cause.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On the other hand, there is a trick in that code to save a malloc
>and
>>>>>> a “static”[*] buffer is used in that code. Therefore if that
>buffer is
>>>>>> reallocated (and therefore the “static” buffer is freed, that
>means that
>>>>>> the memory gets to be corrupted.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [*] Practically the buffer is allocated from the stack, but it’s
>>>>>> working just like a static buffer from the malloc point of view.
>It should
>>>>>> not be freed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* syslog-ng [mailto:syslog-ng-bounces at lists.balabit.hu] *On
>>>>>> Behalf Of *James Elstone
>>>>>> *Sent:* Friday, January 06, 2017 2:55 PM
>>>>>> *To:* Syslog-ng users' and developers' mailing list
>>>>>> *Subject:* Re: [syslog-ng] Hitting g_assert when using
>sanitize-utf8
>>>>>> enabled!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sorry; update - It happens on the first packet that contains
>\x092
>>>>>> when sanitize-utf8 is enabled; consistently.
>>>>>>
>>>>>> Running glib 2.46.2 with Syslog-ng 3.7.3 on FreeBSD 10.3.
>>>>>>
>>>>>> Any ideas please?
>>>>>>
>>>>>> Kr,
>>>>>>
>>>>>> James
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 6 January 2017 13:38:58 GMT+00:00, James Elstone
><james at elstone.net>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Bazsi,
>>>>>>
>>>>>> The version of glib is 2.46.2 on FreeBSD 10.3.
>>>>>>
>>>>>> The issue does not occur on the first packet coming through, but
>when
>>>>>> under light load (~100/sec)...
>>>>>>
>>>>>> Tried reducing the number of unprintable chars and now only \0x92
>>>>>> exists in the inbound message it falls over on. It is always a
>message with
>>>>>> \0x92 that causes it to fail.
>>>>>>
>>>>>> Is there a way to have a filter applies before the message is
>>>>>> utf8_sanitised using a regular expression or the like?
>>>>>>
>>>>>> What if the assert was removed, what effect would it have?
>>>>>>
>>>>>> Many thanks to all!
>>>>>>
>>>>>> Kr,
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 6 January 2017 12:49:28 GMT+00:00, "Scheidler, Balázs" <
>>>>>> balazs.scheidler at balabit.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Attila is right, it would help a lot to see the original log
>message
>>>>>> and your glib version. That code path uses a performance hack
>that relies
>>>>>> on a GLib implementation detail. Either the glib behaviour has
>changed or
>>>>>> another assumption fails, but just looking at the code I don't
>know what
>>>>>> might.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Bazsi
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 6, 2017 at 1:41 PM, Szalai, Attila <
>>>>>> Attila.Szalai at morganstanley.com> wrote:
>>>>>>
>>>>>> Hi James,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Checking the source, it means the following:
>>>>>>
>>>>>>
>>>>>>
>>>>>> The code allocate a buffer 6 times bigger than the original
>string
>>>>>> length to able to hold the escaped form of the utf-8 character.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The assert means, that the string, after escaping was not fit
>into
>>>>>> this buffer for some reason. Or, to be more precise, the GString
>>>>>> implementation decided that it should reallocate the string,
>which usually
>>>>>> only happen if the string gets too big to fit into its original
>place.
>>>>>> Currently I have no recent glib source to check if I’m right.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The original string could help a lot to find the root cause.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ps.: the escaping works as replacing the original byte with \xHH,
>so
>>>>>> theoretically it can only grows from 1 byte to 4, which should
>fit into a
>>>>>> buffer 6 times bigger than the original size.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* syslog-ng [mailto:syslog-ng-bounces at lists.balabit.hu] *On
>>>>>> Behalf Of *James Elstone
>>>>>> *Sent:* Thursday, January 05, 2017 10:35 PM
>>>>>> *To:* syslog-ng at lists.balabit.hu
>>>>>> *Subject:* [syslog-ng] Hitting g_assert when using sanitize-utf8
>>>>>> enabled!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Balabit et al,
>>>>>>
>>>>>> When using the sanitize-utf8 flag I am hitting a g_assert in
>>>>>> modules/syslogformat/syslog-format.c; what could be causing this?
>>>>>>
>>>>>> Any advice welcome!!
>>>>>>
>>>>>> Kr,
>>>>>>
>>>>>> James
>>>>>>
>>>>>
>>>>> --
>>>>> Sent from my Android device with K-9 Mail. Please excuse my
>brevity.
>>>>>
>>>>> ____________________________________________________________
>>>>> __________________
>>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>>> Documentation: http://www.balabit.com/support
>>>>> /documentation/?product=syslog-ng
>>>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>>>
>>>>>
>>>>>
>>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20170113/754e6742/attachment-0001.html>


More information about the syslog-ng mailing list