<html><head></head><body>Hi Balázs,<br>
<br>
The byte 0x92 is usually followed by a 0x73 or 0x53 ( upper or lower case "s" ).<br>
<br>
I think when the message is stored into a gchar* it reads beyond the 0x92 as gchar* expects the input to be UTF-8 compliant.<br>
<br>
As the length is correct being sent into the process handler I am guessing that there is an indirect type cast occurring?<br>
<br>
Just compiling in 3.8.1 now.<br>
<br>
Kr,<br>
<br>
James<br>
<br>
Kr,<br>
<br>
James<br><br><div class="gmail_quote">On 10 January 2017 10:12:49 GMT+00:00, "Scheidler, Balázs" <balazs.scheidler@balabit.com> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div dir="auto">Hmm, thanks for the analysis so far. Is the 0x92 value followed by a zero byte? It seems that for some reason the utf8 escaping functions skip that.</div><div class="gmail_extra"><br /><div class="gmail_quote">On Jan 9, 2017 9:52 PM, "James Elstone" <<a href="mailto:james@elstone.net">james@elstone.net</a>> wrote:<br type="attribution" /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple">Hi Attila,<br />
<br />
The syslog message being sent is with utf8_sanitise enabled on the udp transport:<br />
<br />
<38>Jan 7 20:10:11 hostname-01 microsoft-windows-security-<wbr />auditing[success] 4648 A logon was attempted by that account@s credentials.<br />
<br />
Where @ is byte hex value of 0x92, which is a valid graphical apostrophe in Windows-1252 character set, but in UTF-8 any char with a byte value of between 127 to 159 decimal are control characters. I have truncated the actual log message for brevity here. There has to be syslog load before and after this message is received to see the issue.<br />
<br />
Specifically when reading in UTF-8, (g-string is native UTF-8) byte 0x92 looks for a corresponding 0x9c and ignores null terminations in between... (See Wikipedia' C0 C1 Utf-8 page for a little historic information).<br />
<br />
Looking at the contents of the <src> variable (in 3.7.3 code), it contains multiple syslog messages in syslog-format.c, and strlen of <src> does not equal <left> prior to the procedure call into utf8utils.c. The message received on the wire is about 850 bytes long, <src> is about 8000 bytes when going into utf8utils.c and about 15 bytes in the reassigned ptr variable of the g-string, hence the assert being triggered.<br />
<br />
Going to move to 3.8.1 as there has been a bit of work in this area since 3.7.3 and will retest tomorrow.<br />
<br />
Is there anyway to control the character set the inbound message is parsed against; we only want a UTF-8 compliant stream being outputted by syslog-ng?<br />
<br />
Alternatively is there a way to filter this char out on an upstream syslog-ng instance please (it is passing through an identical instance without utf8_sanitise enabled on it without problem)?<br />
<br />
Kind regards,<br />
<br />
James<br />
<br />
Kr,<br />
<br />
James<br />
<br />
<br /><br /><div class="gmail_quote">On 7 January 2017 19:44:15 GMT+00:00, "Szalai, Attila" <<a href="mailto:Attila.Szalai@morganstanley.com" target="_blank">Attila.Szalai@morganstanley.<wbr />com</a>> wrote:<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<p>
</p><div class="m_-574274273022965528WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I’ve checked the glib source too (in version 2.50, but I do not think it changed too much between the two version) and have no idea how this could happen.<p></p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><p> </p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">So, an example line is definitively needed to find the root cause.<p></p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><p> </p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">On the other hand, there is a trick in that code to save a malloc and a “static”[*] buffer is used in that code. Therefore if that buffer is reallocated (and
therefore the “static” buffer is freed, that means that the memory gets to be corrupted.<p></p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><p> </p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">[*] Practically the buffer is allocated from the stack, but it’s working just like a static buffer from the malloc point of view. It should not be freed.<p></p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><p> </p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> syslog-ng [mailto:<a href="mailto:syslog-ng-bounces@lists.balabit.hu" target="_blank">syslog-ng-bounces@<wbr />lists.balabit.hu</a>]
<b>On Behalf Of </b>James Elstone<br />
<b>Sent:</b> Friday, January 06, 2017 2:55 PM<br />
<b>To:</b> Syslog-ng users' and developers' mailing list<br />
<b>Subject:</b> Re: [syslog-ng] Hitting g_assert when using sanitize-utf8 enabled!<p></p></span></p>
</div>
</div>
<p class="MsoNormal"></p><p> </p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Sorry; update - It happens on the first packet that contains \x092 when sanitize-utf8 is enabled; consistently.<br />
<br />
Running glib 2.46.2 with Syslog-ng 3.7.3 on FreeBSD 10.3. <br />
<br />
Any ideas please?<br />
<br />
Kr,<br />
<br />
James <br />
<br />
James</p><p></p>
<div>
<p class="MsoNormal">On 6 January 2017 13:38:58 GMT+00:00, James Elstone <<a href="mailto:james@elstone.net" target="_blank">james@elstone.net</a>> wrote:</p><p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi Bazsi,<br />
<br />
The version of glib is 2.46.2 on FreeBSD 10.3.<br />
<br />
The issue does not occur on the first packet coming through, but when under light load (~100/sec)...<br />
<br />
Tried reducing the number of unprintable chars and now only \0x92 exists in the inbound message it falls over on. It is always a message with \0x92 that causes it to fail.<br />
<br />
Is there a way to have a filter applies before the message is utf8_sanitised using a regular expression or the like?<br />
<br />
What if the assert was removed, what effect would it have?<br />
<br />
Many thanks to all!<br />
<br />
Kr,<br />
<br />
James</p><p></p>
<div>
<p class="MsoNormal">On 6 January 2017 12:49:28 GMT+00:00, "Scheidler, Balázs" <<a href="mailto:balazs.scheidler@balabit.com" target="_blank">balazs.scheidler@balabit.com</a>> wrote:</p><p></p>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi,</p><p></p>
</div>
<p class="MsoNormal">Attila is right, it would help a lot to see the original log message and your glib version. That code path uses a performance hack that relies on a GLib implementation detail. Either the glib behaviour has changed or another assumption
fails, but just looking at the code I don't know what might.</p><p></p>
</div>
<div>
<p class="MsoNormal"><br clear="all" />
</p><p></p>
<div>
<div>
<div>
<p class="MsoNormal">-- <br />
Bazsi</p><p></p>
</div>
</div>
</div>
<p class="MsoNormal"></p><p> </p>
<div>
<p class="MsoNormal">On Fri, Jan 6, 2017 at 1:41 PM, Szalai, Attila <<a href="mailto:Attila.Szalai@morganstanley.com" target="_blank">Attila.Szalai@morganstanley.<wbr />com</a>> wrote:</p><p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi James,</span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Checking the source, it means the following:</span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The code allocate a buffer 6 times bigger than the original string length to able to hold the escaped
form of the utf-8 character.</span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The assert means, that the string, after escaping was not fit into this buffer for some reason. Or,
to be more precise, the GString implementation decided that it should reallocate the string, which usually only happen if the string gets too big to fit into its original place. Currently I have no recent glib source to check if I’m right.</span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The original string could help a lot to find the root cause.</span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Ps.: the escaping works as replacing the original byte with \xHH, so theoretically it can only grows
from 1 byte to 4, which should fit into a buffer 6 times bigger than the original size.</span></p><p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></p><p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> syslog-ng [mailto:<a href="mailto:syslog-ng-bounces@lists.balabit.hu" target="_blank">syslog-ng-bounces@<wbr />lists.balabit.hu</a>]
<b>On Behalf Of </b>James Elstone<br />
<b>Sent:</b> Thursday, January 05, 2017 10:35 PM<br />
<b>To:</b> <a href="mailto:syslog-ng@lists.balabit.hu" target="_blank">syslog-ng@lists.balabit.hu</a><br />
<b>Subject:</b> [syslog-ng] Hitting g_assert when using sanitize-utf8 enabled!</span></p><p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> </p><p></p>
<p class="MsoNormal">Hi Balabit et al,<br />
<br />
When using the sanitize-utf8 flag I am hitting a g_assert in modules/syslogformat/syslog-<wbr />format.c; what could be causing this?<br />
<br />
Any advice welcome!!<br />
<br />
Kr,<br />
<br />
James<br /></p></div></div></div></div></div></div></div></div></div></div></div></blockquote></div><br />
-- <br />
Sent from my Android device with K-9 Mail. Please excuse my brevity.</div><br />______________________________<wbr />______________________________<wbr />__________________<br />
Member info: <a href="https://lists.balabit.hu/mailman/listinfo/syslog-ng" rel="noreferrer" target="_blank">https://lists.balabit.hu/<wbr />mailman/listinfo/syslog-ng</a><br />
Documentation: <a href="http://www.balabit.com/support/documentation/?product=syslog-ng" rel="noreferrer" target="_blank">http://www.balabit.com/<wbr />support/documentation/?<wbr />product=syslog-ng</a><br />
FAQ: <a href="http://www.balabit.com/wiki/syslog-ng-faq" rel="noreferrer" target="_blank">http://www.balabit.com/wiki/<wbr />syslog-ng-faq</a><br />
<br />
<br /></blockquote></div></div>
</blockquote></div><br>
-- <br>
Sent from my Android device with K-9 Mail. Please excuse my brevity.</body></html>