[syslog-ng] Hitting g_assert when using sanitize-utf8 enabled!
James Elstone
james at elstone.net
Tue Jan 10 13:25:43 UTC 2017
Will give this a twirl shortly.
James
On 10 January 2017 10:14:09 GMT+00:00, "Scheidler, Balázs" <balazs.scheidler at balabit.com> wrote:
>Does this fix it?
>
>diff --git a/lib/utf8utils.c b/lib/utf8utils.c
>index 2b84bdc..c76ffc1 100644
>--- a/lib/utf8utils.c
>+++ b/lib/utf8utils.c
>@@ -114,7 +114,7 @@ _append_unsafe_utf8_as_escaped(GString
>*escaped_output,
>const gchar *raw,
> _append_escaped_utf8_character(escaped_output, &raw, -1,
>unsafe_chars,
> control_format, invalid_format);
> else
>- while (raw_len)
>+ while (raw_len > 0)
> raw_len -= _append_escaped_utf8_character(escaped_output, &raw,
>raw_len, unsafe_chars,
> control_format, invalid_format);
> }
>
>
>--
>Bazsi
>
>On Tue, Jan 10, 2017 at 11:12 AM, Scheidler, Balázs <
>balazs.scheidler at balabit.com> wrote:
>
>> Hmm, thanks for the analysis so far. Is the 0x92 value followed by a
>zero
>> byte? It seems that for some reason the utf8 escaping functions skip
>that.
>>
>> On Jan 9, 2017 9:52 PM, "James Elstone" <james at elstone.net> wrote:
>>
>>> Hi Attila,
>>>
>>> The syslog message being sent is with utf8_sanitise enabled on the
>udp
>>> transport:
>>>
>>> <38>Jan 7 20:10:11 hostname-01
>microsoft-windows-security-auditing[success]
>>> 4648 A logon was attempted by that account at s credentials.
>>>
>>> Where @ is byte hex value of 0x92, which is a valid graphical
>apostrophe
>>> in Windows-1252 character set, but in UTF-8 any char with a byte
>value of
>>> between 127 to 159 decimal are control characters. I have truncated
>the
>>> actual log message for brevity here. There has to be syslog load
>before and
>>> after this message is received to see the issue.
>>>
>>> Specifically when reading in UTF-8, (g-string is native UTF-8) byte
>0x92
>>> looks for a corresponding 0x9c and ignores null terminations in
>between...
>>> (See Wikipedia' C0 C1 Utf-8 page for a little historic information).
>>>
>>> Looking at the contents of the <src> variable (in 3.7.3 code), it
>>> contains multiple syslog messages in syslog-format.c, and strlen of
><src>
>>> does not equal <left> prior to the procedure call into utf8utils.c.
>The
>>> message received on the wire is about 850 bytes long, <src> is about
>8000
>>> bytes when going into utf8utils.c and about 15 bytes in the
>reassigned ptr
>>> variable of the g-string, hence the assert being triggered.
>>>
>>> Going to move to 3.8.1 as there has been a bit of work in this area
>since
>>> 3.7.3 and will retest tomorrow.
>>>
>>> Is there anyway to control the character set the inbound message is
>>> parsed against; we only want a UTF-8 compliant stream being
>outputted by
>>> syslog-ng?
>>>
>>> Alternatively is there a way to filter this char out on an upstream
>>> syslog-ng instance please (it is passing through an identical
>instance
>>> without utf8_sanitise enabled on it without problem)?
>>>
>>> Kind regards,
>>>
>>> James
>>>
>>> Kr,
>>>
>>> James
>>>
>>>
>>>
>>> On 7 January 2017 19:44:15 GMT+00:00, "Szalai, Attila" <
>>> Attila.Szalai at morganstanley.com> wrote:
>>>>
>>>> I’ve checked the glib source too (in version 2.50, but I do not
>think
>>>> it changed too much between the two version) and have no idea how
>this
>>>> could happen.
>>>>
>>>>
>>>>
>>>> So, an example line is definitively needed to find the root cause.
>>>>
>>>>
>>>>
>>>> On the other hand, there is a trick in that code to save a malloc
>and a
>>>> “static”[*] buffer is used in that code. Therefore if that buffer
>is
>>>> reallocated (and therefore the “static” buffer is freed, that means
>that
>>>> the memory gets to be corrupted.
>>>>
>>>>
>>>>
>>>> [*] Practically the buffer is allocated from the stack, but it’s
>working
>>>> just like a static buffer from the malloc point of view. It should
>not be
>>>> freed.
>>>>
>>>>
>>>>
>>>> *From:* syslog-ng [mailto:syslog-ng-bounces at lists.balabit.hu] *On
>>>> Behalf Of *James Elstone
>>>> *Sent:* Friday, January 06, 2017 2:55 PM
>>>> *To:* Syslog-ng users' and developers' mailing list
>>>> *Subject:* Re: [syslog-ng] Hitting g_assert when using
>sanitize-utf8
>>>> enabled!
>>>>
>>>>
>>>>
>>>> Sorry; update - It happens on the first packet that contains \x092
>when
>>>> sanitize-utf8 is enabled; consistently.
>>>>
>>>> Running glib 2.46.2 with Syslog-ng 3.7.3 on FreeBSD 10.3.
>>>>
>>>> Any ideas please?
>>>>
>>>> Kr,
>>>>
>>>> James
>>>>
>>>> James
>>>>
>>>> On 6 January 2017 13:38:58 GMT+00:00, James Elstone
><james at elstone.net>
>>>> wrote:
>>>>
>>>> Hi Bazsi,
>>>>
>>>> The version of glib is 2.46.2 on FreeBSD 10.3.
>>>>
>>>> The issue does not occur on the first packet coming through, but
>when
>>>> under light load (~100/sec)...
>>>>
>>>> Tried reducing the number of unprintable chars and now only \0x92
>exists
>>>> in the inbound message it falls over on. It is always a message
>with \0x92
>>>> that causes it to fail.
>>>>
>>>> Is there a way to have a filter applies before the message is
>>>> utf8_sanitised using a regular expression or the like?
>>>>
>>>> What if the assert was removed, what effect would it have?
>>>>
>>>> Many thanks to all!
>>>>
>>>> Kr,
>>>>
>>>> James
>>>>
>>>> On 6 January 2017 12:49:28 GMT+00:00, "Scheidler, Balázs" <
>>>> balazs.scheidler at balabit.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Attila is right, it would help a lot to see the original log
>message and
>>>> your glib version. That code path uses a performance hack that
>relies on a
>>>> GLib implementation detail. Either the glib behaviour has changed
>or
>>>> another assumption fails, but just looking at the code I don't know
>what
>>>> might.
>>>>
>>>>
>>>> --
>>>> Bazsi
>>>>
>>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 1:41 PM, Szalai, Attila <
>>>> Attila.Szalai at morganstanley.com> wrote:
>>>>
>>>> Hi James,
>>>>
>>>>
>>>>
>>>> Checking the source, it means the following:
>>>>
>>>>
>>>>
>>>> The code allocate a buffer 6 times bigger than the original string
>>>> length to able to hold the escaped form of the utf-8 character.
>>>>
>>>>
>>>>
>>>> The assert means, that the string, after escaping was not fit into
>this
>>>> buffer for some reason. Or, to be more precise, the GString
>implementation
>>>> decided that it should reallocate the string, which usually only
>happen if
>>>> the string gets too big to fit into its original place. Currently I
>have no
>>>> recent glib source to check if I’m right.
>>>>
>>>>
>>>>
>>>> The original string could help a lot to find the root cause.
>>>>
>>>>
>>>>
>>>> Ps.: the escaping works as replacing the original byte with \xHH,
>so
>>>> theoretically it can only grows from 1 byte to 4, which should fit
>into a
>>>> buffer 6 times bigger than the original size.
>>>>
>>>>
>>>>
>>>> *From:* syslog-ng [mailto:syslog-ng-bounces at lists.balabit.hu] *On
>>>> Behalf Of *James Elstone
>>>> *Sent:* Thursday, January 05, 2017 10:35 PM
>>>> *To:* syslog-ng at lists.balabit.hu
>>>> *Subject:* [syslog-ng] Hitting g_assert when using sanitize-utf8
>>>> enabled!
>>>>
>>>>
>>>>
>>>> Hi Balabit et al,
>>>>
>>>> When using the sanitize-utf8 flag I am hitting a g_assert in
>>>> modules/syslogformat/syslog-format.c; what could be causing this?
>>>>
>>>> Any advice welcome!!
>>>>
>>>> Kr,
>>>>
>>>> James
>>>>
>>>
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>
>>> ____________________________________________________________
>>> __________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation:
>http://www.balabit.com/support/documentation/?product=
>>> syslog-ng
>>> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>>>
>>>
>>>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.balabit.hu/pipermail/syslog-ng/attachments/20170110/006a3115/attachment-0001.html>
More information about the syslog-ng
mailing list