load failures in afsocket and afsql
I am getting the following on load: Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' The rpath looks OK: megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$ The libraries are in a reasonable location: /home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help. Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs. I could really use some advice on this one! Matthew.
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything. Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked. Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one). I don't know if any of these will fix the problem, but they can't hurt to verify. On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I am getting the following on load:
Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4'
The rpath looks OK:
megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$
megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$
The libraries are in a reasonable location:
/home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so
Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help.
Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs.
I could really use some advice on this one!
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Hello, I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs. `libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment. I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS. Without the -rpath linker option the app could get built but won't run. Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used. hth, Sandor On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I am getting the following on load:
Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4'
The rpath looks OK:
megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$
megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$
The libraries are in a reasonable location:
/home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so
Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help.
Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs.
I could really use some advice on this one!
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Hello, I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using: export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64" and my libs are: /home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped with all the usual symlinks. The readelf shows the rpath is as follows on each: Library rpath: [/home/y/lib64] Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng). What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-( Matthew. On Tue, Dec 14, 2010 at 02:58:47PM +0100, Sandor Geller wrote:
Hello,
I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs.
`libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment.
I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS.
Without the -rpath linker option the app could get built but won't run.
Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used.
hth,
Sandor
On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I am getting the following on load:
Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4'
The rpath looks OK:
megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$
megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$
The libraries are in a reasonable location:
/home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so
Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help.
Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs.
I could really use some advice on this one!
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
More information: Trying to open module; module='afsocket', filename='/home/y/lib64/syslog-ng/libafsocket.so' Breakpoint 1, 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 (gdb) bt #0 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 #1 0x00002b1af538f82b in plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:206 #2 0x00002b1af5360110 in cfg_set_version (self=0x471e2b0, version=770) at cfg.c:282 #3 0x00002b1af5383c57 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60) at cfg-lexer.c:707 #4 0x00002b1af5394245 in pragma_lex (yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60, lexer=0x47254a0) at pragma-parser.c:50 #5 0x00002b1af5393520 in pragma_parse (lexer=0x47254a0, result=0x7fffcd074bc8) at pragma-grammar.c:2733 #6 0x00002b1af5383db0 in cfg_parser_parse (self=0x2b1af55c0b00, lexer=0x47254a0, instance=0x7fffcd074bc8) at cfg-parser.h:82 #7 0x00002b1af53839f6 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50) at cfg-lexer.c:646 #8 0x00002b1af538c039 in main_lex (yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50, lexer=0x47254a0) at cfg-parser.c:149 #9 0x00002b1af538cf77 in main_parse (lexer=0x47254a0, dummy=0x7fffcd077140) at cfg-grammar.c:2957 #10 0x00002b1af5360658 in cfg_parser_parse (self=0x2b1af55c05e0, lexer=0x47254a0, instance=0x7fffcd077140) at cfg-parser.h:82 #11 0x00002b1af536058d in cfg_run_parser (self=0x471e2b0, lexer=0x47254a0, parser=0x2b1af55c05e0, result=0x7fffcd077140) at cfg.c:378 #12 0x00002b1af5360710 in cfg_read_config (self=0x471e2b0, fname=0x402d18 "/home/y/etc/syslog-ng.conf", syntax_only=0, preprocess_into=0x0) at cfg.c:400 #13 0x0000000000402803 in initial_init (cfg=0x7fffcd0771c8) at main.c:277 #14 0x0000000000402b8c in main (argc=1, argv=0x7fffcd0772c8) at main.c:426 (gdb) step Single stepping until exit from function g_module_open, which has no line number information. plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:207 207 in plugin.c (gdb) step 208 in plugin.c (gdb) step 210 in plugin.c (gdb) step msg_limit_internal_message () at messages.c:91 91 in messages.c (gdb) 201 msg_debug("Trying to open module", 202 evt_tag_str("module", module_name), 203 evt_tag_str("filename", plugin_module_name), 204 NULL); 205 206 mod = g_module_open(plugin_module_name, G_MODULE_BIND_LOCAL); 207 g_free(plugin_module_name); 208 if (!mod) 209 { 210 msg_error("Error opening plugin module", 211 evt_tag_str("module", module_name), 212 evt_tag_str("error", g_module_error()), 213 NULL); 214 g_free(module_init_func); 215 return FALSE; 216 } 217 g_module_make_resident(mod); According to the glib API: First of all g_module_open() tries to open file_name as a module. If that fails and file_name has the ".la"-suffix (and is a libtool archive) it tries to open the corresponding module. If that fails and it doesn't have the proper module suffix for the platform (G_MODULE_SUFFIX), this suffix will be appended and the corresponding module will be opended. If that fails and file_name doesn't have the ".la"-suffix, this suffix is appended and g_module_open() tries to open the corresponding module. If eventually that fails as well, NULL is returned. So in order for 210 to get executed to write the message, g_module_open must have had NULL retval. As above, this could happen when: opening as a module fails opening as a module.la fails opening as a module.platform_suffix fails opening as a module.platform_suffix.la fails The error returned from g_module_error gets output as: /home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4 /home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free Wondering what to try now to debug it further. Matthew. On Tue, Dec 14, 2010 at 11:18:37AM -0800, Matthew Hall wrote:
Hello,
I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using:
export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64"
and my libs are:
/home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
with all the usual symlinks.
The readelf shows the rpath is as follows on each:
Library rpath: [/home/y/lib64]
Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng).
What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-(
Matthew.
On Tue, Dec 14, 2010 at 02:58:47PM +0100, Sandor Geller wrote:
Hello,
I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs.
`libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment.
I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS.
Without the -rpath linker option the app could get built but won't run.
Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used.
hth,
Sandor
On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I am getting the following on load:
Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4'
The rpath looks OK:
megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$
megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$
The libraries are in a reasonable location:
/home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so
Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help.
Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs.
I could really use some advice on this one!
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
What does it do if you configure with --enable-static-linking or switch it to --enable-dynamic-linking? Since it sounds like you're running out of ideas, messing with a few configure options are some easy things to try. On Tue, Dec 14, 2010 at 2:41 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
More information:
Trying to open module; module='afsocket', filename='/home/y/lib64/syslog-ng/libafsocket.so'
Breakpoint 1, 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 (gdb) bt #0 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 #1 0x00002b1af538f82b in plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:206 #2 0x00002b1af5360110 in cfg_set_version (self=0x471e2b0, version=770) at cfg.c:282 #3 0x00002b1af5383c57 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60) at cfg-lexer.c:707 #4 0x00002b1af5394245 in pragma_lex (yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60, lexer=0x47254a0) at pragma-parser.c:50 #5 0x00002b1af5393520 in pragma_parse (lexer=0x47254a0, result=0x7fffcd074bc8) at pragma-grammar.c:2733 #6 0x00002b1af5383db0 in cfg_parser_parse (self=0x2b1af55c0b00, lexer=0x47254a0, instance=0x7fffcd074bc8) at cfg-parser.h:82 #7 0x00002b1af53839f6 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50) at cfg-lexer.c:646 #8 0x00002b1af538c039 in main_lex (yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50, lexer=0x47254a0) at cfg-parser.c:149 #9 0x00002b1af538cf77 in main_parse (lexer=0x47254a0, dummy=0x7fffcd077140) at cfg-grammar.c:2957 #10 0x00002b1af5360658 in cfg_parser_parse (self=0x2b1af55c05e0, lexer=0x47254a0, instance=0x7fffcd077140) at cfg-parser.h:82 #11 0x00002b1af536058d in cfg_run_parser (self=0x471e2b0, lexer=0x47254a0, parser=0x2b1af55c05e0, result=0x7fffcd077140) at cfg.c:378 #12 0x00002b1af5360710 in cfg_read_config (self=0x471e2b0, fname=0x402d18 "/home/y/etc/syslog-ng.conf", syntax_only=0, preprocess_into=0x0) at cfg.c:400 #13 0x0000000000402803 in initial_init (cfg=0x7fffcd0771c8) at main.c:277 #14 0x0000000000402b8c in main (argc=1, argv=0x7fffcd0772c8) at main.c:426
(gdb) step Single stepping until exit from function g_module_open, which has no line number information. plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:207 207 in plugin.c (gdb) step 208 in plugin.c (gdb) step 210 in plugin.c (gdb) step msg_limit_internal_message () at messages.c:91 91 in messages.c (gdb)
201 msg_debug("Trying to open module", 202 evt_tag_str("module", module_name), 203 evt_tag_str("filename", plugin_module_name), 204 NULL); 205 206 mod = g_module_open(plugin_module_name, G_MODULE_BIND_LOCAL); 207 g_free(plugin_module_name); 208 if (!mod) 209 { 210 msg_error("Error opening plugin module", 211 evt_tag_str("module", module_name), 212 evt_tag_str("error", g_module_error()), 213 NULL); 214 g_free(module_init_func); 215 return FALSE; 216 } 217 g_module_make_resident(mod);
According to the glib API:
First of all g_module_open() tries to open file_name as a module. If that fails and file_name has the ".la"-suffix (and is a libtool archive) it tries to open the corresponding module. If that fails and it doesn't have the proper module suffix for the platform (G_MODULE_SUFFIX), this suffix will be appended and the corresponding module will be opended. If that fails and file_name doesn't have the ".la"-suffix, this suffix is appended and g_module_open() tries to open the corresponding module. If eventually that fails as well, NULL is returned.
So in order for 210 to get executed to write the message, g_module_open must have had NULL retval.
As above, this could happen when:
opening as a module fails opening as a module.la fails opening as a module.platform_suffix fails opening as a module.platform_suffix.la fails
The error returned from g_module_error gets output as:
/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4 /home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free
Wondering what to try now to debug it further.
Matthew.
On Tue, Dec 14, 2010 at 11:18:37AM -0800, Matthew Hall wrote:
Hello,
I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using:
export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64"
and my libs are:
/home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
with all the usual symlinks.
The readelf shows the rpath is as follows on each:
Library rpath: [/home/y/lib64]
Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng).
What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-(
Matthew.
On Tue, Dec 14, 2010 at 02:58:47PM +0100, Sandor Geller wrote:
Hello,
I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs.
`libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment.
I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS.
Without the -rpath linker option the app could get built but won't run.
Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used.
hth,
Sandor
On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I am getting the following on load:
Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4'
The rpath looks OK:
megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$
megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$
The libraries are in a reasonable location:
/home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so
Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help.
Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs.
I could really use some advice on this one!
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
megahall@logproxy2:~$ readelf -d /home/y/lib64/syslog-ng/libafsocket.so Dynamic section at offset 0x19978 contains 29 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libsyslog-ng.so.0] 0x0000000000000001 (NEEDED) Shared library: [libssl.so.108] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.108] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2] 0x0000000000000001 (NEEDED) Shared library: [libz.so.1] 0x0000000000000001 (NEEDED) Shared library: [libwrap.so.0] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000e (SONAME) Library soname: [libafsocket-tls.so] 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64:/usr/lib64] ... It looks to me like libafsocket itself is built broken because it does not declare its libnet dependency that it picks up when spoof source is enabled. Matthew. On Tue, Dec 14, 2010 at 03:39:20PM -0600, Martin Holste wrote:
What does it do if you configure with --enable-static-linking or switch it to --enable-dynamic-linking? Since it sounds like you're running out of ideas, messing with a few configure options are some easy things to try.
On Tue, Dec 14, 2010 at 2:41 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
More information:
Trying to open module; module='afsocket', filename='/home/y/lib64/syslog-ng/libafsocket.so'
Breakpoint 1, 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 (gdb) bt #0 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 #1 0x00002b1af538f82b in plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:206 #2 0x00002b1af5360110 in cfg_set_version (self=0x471e2b0, version=770) at cfg.c:282 #3 0x00002b1af5383c57 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60) at cfg-lexer.c:707 #4 0x00002b1af5394245 in pragma_lex (yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60, lexer=0x47254a0) at pragma-parser.c:50 #5 0x00002b1af5393520 in pragma_parse (lexer=0x47254a0, result=0x7fffcd074bc8) at pragma-grammar.c:2733 #6 0x00002b1af5383db0 in cfg_parser_parse (self=0x2b1af55c0b00, lexer=0x47254a0, instance=0x7fffcd074bc8) at cfg-parser.h:82 #7 0x00002b1af53839f6 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50) at cfg-lexer.c:646 #8 0x00002b1af538c039 in main_lex (yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50, lexer=0x47254a0) at cfg-parser.c:149 #9 0x00002b1af538cf77 in main_parse (lexer=0x47254a0, dummy=0x7fffcd077140) at cfg-grammar.c:2957 #10 0x00002b1af5360658 in cfg_parser_parse (self=0x2b1af55c05e0, lexer=0x47254a0, instance=0x7fffcd077140) at cfg-parser.h:82 #11 0x00002b1af536058d in cfg_run_parser (self=0x471e2b0, lexer=0x47254a0, parser=0x2b1af55c05e0, result=0x7fffcd077140) at cfg.c:378 #12 0x00002b1af5360710 in cfg_read_config (self=0x471e2b0, fname=0x402d18 "/home/y/etc/syslog-ng.conf", syntax_only=0, preprocess_into=0x0) at cfg.c:400 #13 0x0000000000402803 in initial_init (cfg=0x7fffcd0771c8) at main.c:277 #14 0x0000000000402b8c in main (argc=1, argv=0x7fffcd0772c8) at main.c:426
(gdb) step Single stepping until exit from function g_module_open, which has no line number information. plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:207 207 in plugin.c (gdb) step 208 in plugin.c (gdb) step 210 in plugin.c (gdb) step msg_limit_internal_message () at messages.c:91 91 in messages.c (gdb)
201 msg_debug("Trying to open module", 202 evt_tag_str("module", module_name), 203 evt_tag_str("filename", plugin_module_name), 204 NULL); 205 206 mod = g_module_open(plugin_module_name, G_MODULE_BIND_LOCAL); 207 g_free(plugin_module_name); 208 if (!mod) 209 { 210 msg_error("Error opening plugin module", 211 evt_tag_str("module", module_name), 212 evt_tag_str("error", g_module_error()), 213 NULL); 214 g_free(module_init_func); 215 return FALSE; 216 } 217 g_module_make_resident(mod);
According to the glib API:
First of all g_module_open() tries to open file_name as a module. If that fails and file_name has the ".la"-suffix (and is a libtool archive) it tries to open the corresponding module. If that fails and it doesn't have the proper module suffix for the platform (G_MODULE_SUFFIX), this suffix will be appended and the corresponding module will be opended. If that fails and file_name doesn't have the ".la"-suffix, this suffix is appended and g_module_open() tries to open the corresponding module. If eventually that fails as well, NULL is returned.
So in order for 210 to get executed to write the message, g_module_open must have had NULL retval.
As above, this could happen when:
opening as a module fails opening as a module.la fails opening as a module.platform_suffix fails opening as a module.platform_suffix.la fails
The error returned from g_module_error gets output as:
/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4 /home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free
Wondering what to try now to debug it further.
Matthew.
On Tue, Dec 14, 2010 at 11:18:37AM -0800, Matthew Hall wrote:
Hello,
I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using:
export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64"
and my libs are:
/home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
with all the usual symlinks.
The readelf shows the rpath is as follows on each:
Library rpath: [/home/y/lib64]
Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng).
What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-(
Matthew.
On Tue, Dec 14, 2010 at 02:58:47PM +0100, Sandor Geller wrote:
Hello,
I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs.
`libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment.
I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS.
Without the -rpath linker option the app could get built but won't run.
Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used.
hth,
Sandor
On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I am getting the following on load:
Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4'
The rpath looks OK:
megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] megahall@logproxy2:~$
megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' megahall@logproxy2:~$
The libraries are in a reasonable location:
/home/y/lib64/libdbi.so.1.0.0 /home/y/lib64/libnet.so.1.5.0 /home/y/lib64/dbd/libdbdsqlite3.so /home/y/lib64/dbd/libdbdmysql.so /home/y/lib64/libnet.so.1 /home/y/lib64/libdbi.so.1 /home/y/lib64/libdbi.so
Reading through the glib docs for glib modules, it seems like the .la files are maybe not containing the right library dependencies, or something like this. However adding the library directories using LD_LIBRARY_PATH as a temporary test does not help.
Because this step fails, it's not possible to use tcp, udp, or any of the other important drivers you need to collect logs.
I could really use some advice on this one!
Matthew. ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
I do recall that spoof-source would not work on SLES 10--I had to upgrade to SLES 11.2 in order to get the non-broken libnet. On Tue, Dec 14, 2010 at 4:18 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
megahall@logproxy2:~$ readelf -d /home/y/lib64/syslog-ng/libafsocket.so
Dynamic section at offset 0x19978 contains 29 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libsyslog-ng.so.0] 0x0000000000000001 (NEEDED) Shared library: [libssl.so.108] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.108] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2] 0x0000000000000001 (NEEDED) Shared library: [libz.so.1] 0x0000000000000001 (NEEDED) Shared library: [libwrap.so.0] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000e (SONAME) Library soname: [libafsocket-tls.so] 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64:/usr/lib64] ...
It looks to me like libafsocket itself is built broken because it does not declare its libnet dependency that it picks up when spoof source is enabled.
Matthew.
On Tue, Dec 14, 2010 at 03:39:20PM -0600, Martin Holste wrote:
What does it do if you configure with --enable-static-linking or switch it to --enable-dynamic-linking? Since it sounds like you're running out of ideas, messing with a few configure options are some easy things to try.
On Tue, Dec 14, 2010 at 2:41 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
More information:
Trying to open module; module='afsocket', filename='/home/y/lib64/syslog-ng/libafsocket.so'
Breakpoint 1, 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 (gdb) bt #0 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 #1 0x00002b1af538f82b in plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:206 #2 0x00002b1af5360110 in cfg_set_version (self=0x471e2b0, version=770) at cfg.c:282 #3 0x00002b1af5383c57 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60) at cfg-lexer.c:707 #4 0x00002b1af5394245 in pragma_lex (yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60, lexer=0x47254a0) at pragma-parser.c:50 #5 0x00002b1af5393520 in pragma_parse (lexer=0x47254a0, result=0x7fffcd074bc8) at pragma-grammar.c:2733 #6 0x00002b1af5383db0 in cfg_parser_parse (self=0x2b1af55c0b00, lexer=0x47254a0, instance=0x7fffcd074bc8) at cfg-parser.h:82 #7 0x00002b1af53839f6 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50) at cfg-lexer.c:646 #8 0x00002b1af538c039 in main_lex (yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50, lexer=0x47254a0) at cfg-parser.c:149 #9 0x00002b1af538cf77 in main_parse (lexer=0x47254a0, dummy=0x7fffcd077140) at cfg-grammar.c:2957 #10 0x00002b1af5360658 in cfg_parser_parse (self=0x2b1af55c05e0, lexer=0x47254a0, instance=0x7fffcd077140) at cfg-parser.h:82 #11 0x00002b1af536058d in cfg_run_parser (self=0x471e2b0, lexer=0x47254a0, parser=0x2b1af55c05e0, result=0x7fffcd077140) at cfg.c:378 #12 0x00002b1af5360710 in cfg_read_config (self=0x471e2b0, fname=0x402d18 "/home/y/etc/syslog-ng.conf", syntax_only=0, preprocess_into=0x0) at cfg.c:400 #13 0x0000000000402803 in initial_init (cfg=0x7fffcd0771c8) at main.c:277 #14 0x0000000000402b8c in main (argc=1, argv=0x7fffcd0772c8) at main.c:426
(gdb) step Single stepping until exit from function g_module_open, which has no line number information. plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:207 207 in plugin.c (gdb) step 208 in plugin.c (gdb) step 210 in plugin.c (gdb) step msg_limit_internal_message () at messages.c:91 91 in messages.c (gdb)
201 msg_debug("Trying to open module", 202 evt_tag_str("module", module_name), 203 evt_tag_str("filename", plugin_module_name), 204 NULL); 205 206 mod = g_module_open(plugin_module_name, G_MODULE_BIND_LOCAL); 207 g_free(plugin_module_name); 208 if (!mod) 209 { 210 msg_error("Error opening plugin module", 211 evt_tag_str("module", module_name), 212 evt_tag_str("error", g_module_error()), 213 NULL); 214 g_free(module_init_func); 215 return FALSE; 216 } 217 g_module_make_resident(mod);
According to the glib API:
First of all g_module_open() tries to open file_name as a module. If that fails and file_name has the ".la"-suffix (and is a libtool archive) it tries to open the corresponding module. If that fails and it doesn't have the proper module suffix for the platform (G_MODULE_SUFFIX), this suffix will be appended and the corresponding module will be opended. If that fails and file_name doesn't have the ".la"-suffix, this suffix is appended and g_module_open() tries to open the corresponding module. If eventually that fails as well, NULL is returned.
So in order for 210 to get executed to write the message, g_module_open must have had NULL retval.
As above, this could happen when:
opening as a module fails opening as a module.la fails opening as a module.platform_suffix fails opening as a module.platform_suffix.la fails
The error returned from g_module_error gets output as:
/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4 /home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free
Wondering what to try now to debug it further.
Matthew.
On Tue, Dec 14, 2010 at 11:18:37AM -0800, Matthew Hall wrote:
Hello,
I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using:
export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64"
and my libs are:
/home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
with all the usual symlinks.
The readelf shows the rpath is as follows on each:
Library rpath: [/home/y/lib64]
Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng).
What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-(
Matthew.
On Tue, Dec 14, 2010 at 02:58:47PM +0100, Sandor Geller wrote:
Hello,
I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs.
`libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment.
I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS.
Without the -rpath linker option the app could get built but won't run.
Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used.
hth,
Sandor
On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote: > I am getting the following on load: > > Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' > Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' > Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' > > The rpath looks OK: > > megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath > 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] > megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath > 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] > megahall@logproxy2:~$ > > megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' > megahall@logproxy2:~$ > > The libraries are in a reasonable location: > > /home/y/lib64/libdbi.so.1.0.0 > /home/y/lib64/libnet.so.1.5.0 > /home/y/lib64/dbd/libdbdsqlite3.so > /home/y/lib64/dbd/libdbdmysql.so > /home/y/lib64/libnet.so.1 > /home/y/lib64/libdbi.so.1 > /home/y/lib64/libdbi.so > > Reading through the glib docs for glib modules, it seems like the .la > files are maybe not containing the right library dependencies, or > something like this. However adding the library directories using > LD_LIBRARY_PATH as a temporary test does not help. > > Because this step fails, it's not possible to use tcp, udp, or any of > the other important drivers you need to collect logs. > > I could really use some advice on this one! > > Matthew. > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Tue, Dec 14, 2010 at 04:34:18PM -0600, Martin Holste wrote:
I do recall that spoof-source would not work on SLES 10--I had to upgrade to SLES 11.2 in order to get the non-broken libnet.
See latest update coming in another mail. Weird stuff still happens even when I disable spoof and sql.
Hi, This is a rather long message, not only from a single person. But I'll try to answer inline and give a summary at the end, to start sorting things out. On Tue, 2010-12-14 at 16:34 -0600, Martin Holste wrote:
I do recall that spoof-source would not work on SLES 10--I had to upgrade to SLES 11.2 in order to get the non-broken libnet.
I think that was an unrelated problem. In that case syslog-ng could be built, but the checksum of packets were not correctly calculated.
On Tue, Dec 14, 2010 at 4:18 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
megahall@logproxy2:~$ readelf -d /home/y/lib64/syslog-ng/libafsocket.so
Dynamic section at offset 0x19978 contains 29 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libsyslog-ng.so.0] 0x0000000000000001 (NEEDED) Shared library: [libssl.so.108] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.108] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2] 0x0000000000000001 (NEEDED) Shared library: [libz.so.1] 0x0000000000000001 (NEEDED) Shared library: [libwrap.so.0] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000e (SONAME) Library soname: [libafsocket-tls.so] 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64:/usr/lib64] ...
It looks to me like libafsocket itself is built broken because it does not declare its libnet dependency that it picks up when spoof source is enabled.
Hmm.. this is strange. In my build of syslog-ng, libafsocket.so is linked against libnet (and I also build into a private prefix): 0x0000000000000001 (NEEDED) Shared library: [libsyslog-ng.so.0] 0x0000000000000001 (NEEDED) Shared library: [libssl.so.0.9.8] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.0.9.8] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2] 0x0000000000000001 (NEEDED) Shared library: [libnet.so.1] <--- 0x0000000000000001 (NEEDED) Shared library: [libwrap.so.0] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] the thing that determines whether afsocket is linked against libnet or not is LIBNET_LIBS variable as detected by the configure script. You can check your config.status script to see your value, mine has this: $ grep LIBNET_LIBS config.status S["LIBNET_LIBS"]="-lnet" This is pasted into the libafsocket.so linker command line, in modules/afsocket/Makefile.am
Matthew.
According to the glib API:
First of all g_module_open() tries to open file_name as a module. If that fails and file_name has the ".la"-suffix (and is a libtool archive) it tries to open the corresponding module. If that fails and it doesn't have the proper module suffix for the platform (G_MODULE_SUFFIX), this suffix will be appended and the corresponding module will be opended. If that fails and file_name doesn't have the ".la"-suffix, this suffix is appended and g_module_open() tries to open the corresponding module. If eventually that fails as well, NULL is returned.
So in order for 210 to get executed to write the message, g_module_open must have had NULL retval.
As above, this could happen when:
opening as a module fails opening as a module.la fails opening as a module.platform_suffix fails opening as a module.platform_suffix.la fails
The error returned from g_module_error gets output as:
/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4 /home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free
Wondering what to try now to debug it further.
The glib .la parser peek out 3 values from the .la file: installed dlname libdir these are used to locate the .so file and the dependency loading is not affected. (those are performed by the dynamic linker).
Matthew.
On Tue, Dec 14, 2010 at 11:18:37AM -0800, Matthew Hall wrote:
Hello,
I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using:
export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64"
and my libs are:
/home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
with all the usual symlinks.
The readelf shows the rpath is as follows on each:
Library rpath: [/home/y/lib64]
Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng).
What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-(
The root cause seems to be that your afsocket.so is not linked against libnet.so It'd be wise to check out the link command line and the LIBNET_LIBS value suggested above. -- Bazsi
On Fri, Dec 17, 2010 at 04:22:25PM +0100, Balazs Scheidler wrote:
The root cause seems to be that your afsocket.so is not linked against libnet.so
It'd be wise to check out the link command line and the LIBNET_LIBS value suggested above.
The cause of this one was an option passed to the --libnet-something (can't remember its name right now). It was working right on one system and silently failed on another. When I removed it this problem went away. Matthew.
Disabled spoof source support for debugging. Initialization process gets further but still bombs eventually: for (i = 0; i < self->initialized_pipes->len; i++) { if (!log_pipe_init(g_ptr_array_index(self->initialized_pipes, i), cfg)) { msg_error("Error initializing message pipeline", NULL); return FALSE; } } I get that rather cryptic "Error initializing message pipeline" with no explanation or other info. (gdb) bt #0 log_center_init (self=0x1c436050, cfg=0x1c435250) at center.c:510 #1 0x00002aca8d98bf9f in cfg_init (cfg=0x1c435250) at cfg.c:245 #2 0x00002aca8d98cb4b in cfg_initial_init (cfg=0x1c435250, persist_filename=0x402d50 "/home/y/var/run/syslog-ng/syslog-ng.persist") at cfg.c:510 #3 0x0000000000402836 in initial_init (cfg=0x7ffffee7b458) at main.c:287 #4 0x0000000000402b8c in main (argc=1, argv=0x7ffffee7b558) at main.c:426 (gdb) The code will not link at all without using --enable-dynamic-linking instead of --enable-mixed-linking on this system. Removing sql support gets rid of my libdbi error. However with both spoof source and sql disabled, the pipeline error remains nonetheless. Matthew. On Tue, Dec 14, 2010 at 02:18:59PM -0800, Matthew Hall wrote:
megahall@logproxy2:~$ readelf -d /home/y/lib64/syslog-ng/libafsocket.so
Dynamic section at offset 0x19978 contains 29 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libsyslog-ng.so.0] 0x0000000000000001 (NEEDED) Shared library: [libssl.so.108] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.108] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2] 0x0000000000000001 (NEEDED) Shared library: [libz.so.1] 0x0000000000000001 (NEEDED) Shared library: [libwrap.so.0] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x000000000000000e (SONAME) Library soname: [libafsocket-tls.so] 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64:/usr/lib64] ...
It looks to me like libafsocket itself is built broken because it does not declare its libnet dependency that it picks up when spoof source is enabled.
Matthew.
On Tue, Dec 14, 2010 at 03:39:20PM -0600, Martin Holste wrote:
What does it do if you configure with --enable-static-linking or switch it to --enable-dynamic-linking? Since it sounds like you're running out of ideas, messing with a few configure options are some easy things to try.
On Tue, Dec 14, 2010 at 2:41 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
More information:
Trying to open module; module='afsocket', filename='/home/y/lib64/syslog-ng/libafsocket.so'
Breakpoint 1, 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 (gdb) bt #0 0x00002b1af55db9ba in g_module_open () from /home/y/lib64/libgmodule-2.0.so.0 #1 0x00002b1af538f82b in plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:206 #2 0x00002b1af5360110 in cfg_set_version (self=0x471e2b0, version=770) at cfg.c:282 #3 0x00002b1af5383c57 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60) at cfg-lexer.c:707 #4 0x00002b1af5394245 in pragma_lex (yylval=0x7fffcd074a80, yylloc=0x7fffcd074a60, lexer=0x47254a0) at pragma-parser.c:50 #5 0x00002b1af5393520 in pragma_parse (lexer=0x47254a0, result=0x7fffcd074bc8) at pragma-grammar.c:2733 #6 0x00002b1af5383db0 in cfg_parser_parse (self=0x2b1af55c0b00, lexer=0x47254a0, instance=0x7fffcd074bc8) at cfg-parser.h:82 #7 0x00002b1af53839f6 in cfg_lexer_lex (self=0x47254a0, yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50) at cfg-lexer.c:646 #8 0x00002b1af538c039 in main_lex (yylval=0x7fffcd076f70, yylloc=0x7fffcd076f50, lexer=0x47254a0) at cfg-parser.c:149 #9 0x00002b1af538cf77 in main_parse (lexer=0x47254a0, dummy=0x7fffcd077140) at cfg-grammar.c:2957 #10 0x00002b1af5360658 in cfg_parser_parse (self=0x2b1af55c05e0, lexer=0x47254a0, instance=0x7fffcd077140) at cfg-parser.h:82 #11 0x00002b1af536058d in cfg_run_parser (self=0x471e2b0, lexer=0x47254a0, parser=0x2b1af55c05e0, result=0x7fffcd077140) at cfg.c:378 #12 0x00002b1af5360710 in cfg_read_config (self=0x471e2b0, fname=0x402d18 "/home/y/etc/syslog-ng.conf", syntax_only=0, preprocess_into=0x0) at cfg.c:400 #13 0x0000000000402803 in initial_init (cfg=0x7fffcd0771c8) at main.c:277 #14 0x0000000000402b8c in main (argc=1, argv=0x7fffcd0772c8) at main.c:426
(gdb) step Single stepping until exit from function g_module_open, which has no line number information. plugin_load_module (module_name=0x2b1af53976f1 "afsocket", cfg=0x471e2b0, args=0x0) at plugin.c:207 207 in plugin.c (gdb) step 208 in plugin.c (gdb) step 210 in plugin.c (gdb) step msg_limit_internal_message () at messages.c:91 91 in messages.c (gdb)
201 msg_debug("Trying to open module", 202 evt_tag_str("module", module_name), 203 evt_tag_str("filename", plugin_module_name), 204 NULL); 205 206 mod = g_module_open(plugin_module_name, G_MODULE_BIND_LOCAL); 207 g_free(plugin_module_name); 208 if (!mod) 209 { 210 msg_error("Error opening plugin module", 211 evt_tag_str("module", module_name), 212 evt_tag_str("error", g_module_error()), 213 NULL); 214 g_free(module_init_func); 215 return FALSE; 216 } 217 g_module_make_resident(mod);
According to the glib API:
First of all g_module_open() tries to open file_name as a module. If that fails and file_name has the ".la"-suffix (and is a libtool archive) it tries to open the corresponding module. If that fails and it doesn't have the proper module suffix for the platform (G_MODULE_SUFFIX), this suffix will be appended and the corresponding module will be opended. If that fails and file_name doesn't have the ".la"-suffix, this suffix is appended and g_module_open() tries to open the corresponding module. If eventually that fails as well, NULL is returned.
So in order for 210 to get executed to write the message, g_module_open must have had NULL retval.
As above, this could happen when:
opening as a module fails opening as a module.la fails opening as a module.platform_suffix fails opening as a module.platform_suffix.la fails
The error returned from g_module_error gets output as:
/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4 /home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free
Wondering what to try now to debug it further.
Matthew.
On Tue, Dec 14, 2010 at 11:18:37AM -0800, Matthew Hall wrote:
Hello,
I'm aware of rpath overall but not an expert at its usage as I have not needed it for previous environments I've used. I have been using:
export CFLAGS="-I /home/y/include" export LDFLAGS="-Wl,-rpath,/home/y/lib64"
and my libs are:
/home/y/lib64/libnet.so.1.5.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), stripped /home/y/lib64/libdbi.so.1.0.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdmysql.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped /home/y/lib64/dbd/libdbdsqlite3.so: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
with all the usual symlinks.
The readelf shows the rpath is as follows on each:
Library rpath: [/home/y/lib64]
Strace seems to show that the libnet and libdbi don't get opened (whether I follow parent or child syslog-ng).
What could I try next? My thought was perhaps GDB but that will get very messy very quickly. :-(
Matthew.
On Tue, Dec 14, 2010 at 02:58:47PM +0100, Sandor Geller wrote:
Hello,
I don't think this is related to glib at all, sounds like a linker issue due to missing RPATHs.
`libnet-config --libs` doesn't give library paths, just outputs '-lnet'. Similarly 'libnet-config --defines' doesn't contain header location so CFLAGS and LDFLAGS should get adjusted by the build environment.
I added "-I<libnet_prefix>/include" to CFLAGS and "-L<libnet_prefix>/lib -Wl,-rpath,<libnet_prefix>/lib" to LDFLAGS.
Without the -rpath linker option the app could get built but won't run.
Don't know about DBI as I'm not using it. In theory its pkgconfig should contain all needed paths. AFAIK upstream libdbi still doesn't have pkgconfig support so probably a cvs snapshot should get used.
hth,
Sandor
On Tue, Dec 14, 2010 at 6:17 AM, Martin Holste <mcholste@gmail.com> wrote:
Hm, I know I ran into something similar to this a long time ago, but I'm having a hard time remembering exactly how I fixed it. I do believe that it had something to do with needing to install some dev RPM's, but I don't want to go on record as saying that will definitely fix this. Obviously, though, it might be good to triple-check that you've got -devel on everything.
Also, since your libs are in a non-standard place, it could also be a bug in that values that work during the configure step are not getting passed as macros everywhere they need to be in the make step. You may want to try editing your ld.so.conf to include your custom lib location if you haven't already and running ldconfig -v to make sure it's being linked.
Finally, if you're building the dependency libs from source as well, making sure that there aren't any other "make" steps that need to be done is another one to check off. I believe some libs need "make shared" (libpcap, for one).
I don't know if any of these will fix the problem, but they can't hurt to verify.
On Mon, Dec 13, 2010 at 8:12 PM, Matthew Hall <mhall@mhcomputing.net> wrote: > I am getting the following on load: > > Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' > Error opening plugin module; module='afsql', error='/home/y/lib64/syslog-ng/libafsql.so: undefined symbol: dbi_result_free' > Error opening plugin module; module='afsocket', error='/home/y/lib64/syslog-ng/libafsocket.so: undefined symbol: libnet_build_ipv4' > > The rpath looks OK: > > megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsocket.so | fgrep -i rpath > 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] > megahall@logproxy2:~$ readelf -a /home/y/lib64/syslog-ng/libafsql.so | fgrep -i rpath > 0x000000000000000f (RPATH) Library rpath: [/home/y/lib64] > megahall@logproxy2:~$ > > megahall@logproxy2:~$ ldd /home/y/lib64/syslog-ng/libafs* | fgrep -i '(dbi|net)' > megahall@logproxy2:~$ > > The libraries are in a reasonable location: > > /home/y/lib64/libdbi.so.1.0.0 > /home/y/lib64/libnet.so.1.5.0 > /home/y/lib64/dbd/libdbdsqlite3.so > /home/y/lib64/dbd/libdbdmysql.so > /home/y/lib64/libnet.so.1 > /home/y/lib64/libdbi.so.1 > /home/y/lib64/libdbi.so > > Reading through the glib docs for glib modules, it seems like the .la > files are maybe not containing the right library dependencies, or > something like this. However adding the library directories using > LD_LIBRARY_PATH as a temporary test does not help. > > Because this step fails, it's not possible to use tcp, udp, or any of > the other important drivers you need to collect logs. > > I could really use some advice on this one! > > Matthew. > ______________________________________________________________________________ > Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng > Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng > FAQ: http://www.campin.net/syslog-ng/faq.html > > ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
OK. I still have no idea why spoof source libnet and sql libdbi fail to load. But I did get a good idea why the init still fails with the cryptic pipeline message with those disabled. #0 log_db_parser_init (s=0x3dcb270, cfg=0x3d9d860) at dbparser.c:147 #1 0x00002abeea802f79 in log_parser_init (s=0x3dcb270, cfg=0x3d9d860) at logparser.h:56 #2 0x00002abeea802f27 in log_parser_rule_init (s=0x3dcb360, cfg=0x3d9d860) at logparser.c:105 #3 0x00002abeea803269 in log_process_rule_init (s=0x3dcb360, cfg=0x3d9d860) at logprocess.h:62 #4 0x00002abeea803238 in log_process_pipe_init (s=0x3dccdd0) at logprocess.c:60 #5 0x00002abeea7f0036 in log_pipe_init (s=0x3dccdd0, cfg=0x3d9d860) at logpipe.h:91 #6 0x00002abeea7eff34 in log_center_init (self=0x3da96f0, cfg=0x3d9d860) at center.c:508 #7 0x00002abeea7e6caf in cfg_init (cfg=0x3d9d860) at cfg.c:245 #8 0x00002abeea7e7846 in cfg_initial_init (cfg=0x3d9d860, persist_filename=0x402d10 "/home/y/var/run/syslog-ng/syslog-ng.persist") at cfg.c:510 #9 0x00000000004027f6 in initial_init (cfg=0x7fff1e086978) at main.c:287 #10 0x0000000000402b4c in main (argc=1, argv=0x7fff1e086a78) at main.c:426 log_db_parser_init can fail an init without logging an error and pass it all the way back up to log_center_init without explaining what failed, why it failed, or how it failed, in some cases. log_center_init is so far up it doesn't know how to log a useful message about what really happened. Apparently, in log_db_parser_reload_database from dbparser.c you can get a case where: if ((self->db_file_inode == st.st_ino && self->db_file_mtime == st.st_mtime)) forces an early return despite the fact that self->db is still NULL. I think this can happen when cfg_persist_config_fetch is supposed to have the DB stored already but it unexpectedly does not. I will have to investigate what's gone wrong with the persistent storage on my system. Probably a missing directory cause a file write failure, etc. Breakpoint 46, log_db_parser_init (s=0x1009d270, cfg=0x1006f860) at dbparser.c:121 121 LogDBParser *self = (LogDBParser *) s; 123 self->db = cfg_persist_config_fetch(cfg, log_db_parser_format_persist_name(self)); 124 if (self->db) (gdb) print self->db *** self->db fails to initialize via cfg_persist_config_fetch *** $32 = (PatternDB *) 0x0 142 log_db_parser_reload_database(self); Breakpoint 49, log_db_parser_reload_database (self=0x1009d270) at dbparser.c:61 61 if (stat(self->db_file, &st) < 0) 68 if ((self->db_file_inode == st.st_ino && self->db_file_mtime == st.st_mtime)) *** early return happens here without checking if self->db == NULL *** 98 } ... Breakpoint 47, log_db_parser_init (s=0x1009d270, cfg=0x1006f860) at dbparser.c:146 146 return self->db != NULL; (gdb) print self->db $33 = (PatternDB *) 0x0 (gdb) n ... log_parser_rule_init (s=0x1009d360, cfg=0x1006f860) at logparser.c:106 106 success = FALSE; ... log_pipe_init (s=0x1009edd0, cfg=0x1006f860) at logpipe.h:96 96 return FALSE; ... log_center_init (self=0x1007b6f0, cfg=0x1006f860) at center.c:510 510 msg_error("Error initializing message pipeline", Matthew.
Some more information. The first time my patterndb is read from disk it's supposed to be persisted to some kind of permanent storage. But for some reason it does not seem to be. Thus when g_hash_table_lookup_extended is called, it returns NULL, and all hell breaks loose because the rest of the code assumes (reasonably) that an already-initted structure can't be NULL when you have initialized it and persisted it previously. I need to figure out how this thing is getting persisted and what is going wrong there, as the problem is not getting logged anywhere I've been able to find. I think it has to be persisted somewhere in here: log_db_parser_reload_database log_db_parser_init log_parser_init log_parser_rule_init log_process_rule_init log_process_pipe_init log_pipe_init log_center_init cfg_init cfg_initial_init initial_init Now I have a ton of code to read. :-( I really hope somebody else can help me sort this one out. Matthew. log_pipe_init (s=0x148f8ec0, cfg=0x148c9860) at logpipe.h:88 88 if (!(s->flags & PIF_INITIALIZED)) (gdb) print *s $34 = {ref_cnt = 1, flags = 0, cfg = 0x0, pipe_next = 0x148f8f10, queue = 0x2adb089e37ef <log_process_pipe_queue>, init = 0x2adb089e3723 <log_process_pipe_init>, deinit = 0x2adb089e378e <log_process_pipe_deinit>, free_fn = 0x2adb089e38cd <log_process_pipe_free>, notify = 0} ... 91 if (!s->init || s->init(s)) ... Breakpoint 1, log_db_parser_init (s=0x148f7270, cfg=0x148c9860) at dbparser.c:121 121 LogDBParser *self = (LogDBParser *) s; 123 self->db = cfg_persist_config_fetch(cfg, log_db_parser_format_persist_name(self)); log_db_parser_format_persist_name (self=0x148f7270) at dbparser.c:114 114 g_snprintf(persist_name, sizeof(persist_name), "db-parser(%s)", self->db_file); (gdb) print self->db_file $35 = (gchar *) 0x148f7320 "/home/y/conf/ysyslogng/xml/corpmon-db.xml" 115 return persist_name; (gdb) print persist_name $36 = "db-parser(/home/y/conf/ysyslogng/xml/corpmon-db.xml)", '\0' <repeats 459 times> ... cfg_persist_config_fetch (cfg=0x148c9860, name=0x2adb0a72bd60 "db-parser(/home/y/conf/ysyslogng/xml/corpmon-db.xml)") at cfg.c:599 599 gpointer res = NULL; 604 if (cfg->persist && g_hash_table_lookup_extended(cfg->persist->keys, name, &tmp1, &tmp2)) 615 return res; (gdb) print res $37 = (gpointer) 0x0 616 } ... log_db_parser_init (s=0x148f7270, cfg=0x148c9860) at dbparser.c:124 124 if (self->db) 142 log_db_parser_reload_database(self); (gdb) print self->db $38 = (PatternDB *) 0x0 ... Breakpoint 4, log_db_parser_reload_database (self=0x148f7270) at dbparser.c:61 61 if (stat(self->db_file, &st) < 0) 68 if ((self->db_file_inode == st.st_ino && self->db_file_mtime == st.st_mtime)) 98 } (gdb) print self->db_file_inode $39 = 193445917 (gdb) print st.st_ino $40 = 193445917 (gdb) print self->db_file_mtime $41 = 1292463962 (gdb) print st.st_mtim $42 = {tv_sec = 1292463962, tv_nsec = 0} ... (gdb) n log_db_parser_init (s=0x148f7270, cfg=0x148c9860) at dbparser.c:145 145 self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); (gdb) n 146 return self->db != NULL; (gdb) print self->db $43 = (PatternDB *) 0x0 147 } ... log_parser_init (s=0x148f7270, cfg=0x148c9860) at logparser.h:58 58 } ... log_parser_rule_init (s=0x148f7360, cfg=0x148c9860) at logparser.c:106 106 success = FALSE; 103 for (l = self->parser_list; l; l = l->next) 108 return success; 110 } ... log_process_rule_init (s=0x148f7360, cfg=0x148c9860) at logprocess.h:64 64 } ... log_process_pipe_init (s=0x148f8ec0) at logprocess.c:61 61 } ... log_pipe_init (s=0x148f8ec0, cfg=0x148c9860) at logpipe.h:96 96 return FALSE; 99 } ... log_center_init (self=0x148d56f0, cfg=0x148c9860) at center.c:510 510 msg_error("Error initializing message pipeline",
Getting closer. The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit. Matthew. parser p_mon { db-parser(file("/home/y/conf/syslog-ng/xml/mon-db.xml")); }; # parsed log { source(s_udp); source(s_tcp); filter(f_mon_useless); parser(p_mon); filter(f_class_security); rewrite(r_set_format_parsed); rewrite(r_set_type); destination(d_normal); rewrite(r_set_format_welf); rewrite(r_add_welf_raw); destination(d_welf_loghive_sp2); destination(d_welf); }; # unparsed log { source(s_udp); source(s_tcp); filter(f_mon_useless); # XXX: problem disappears when this one is commented out # parser(p_mon); filter(f_class_unknown); rewrite(r_set_format_unparsed); rewrite(r_set_type); destination(d_normal); };
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point. On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
parser p_mon { db-parser(file("/home/y/conf/syslog-ng/xml/mon-db.xml")); };
# parsed log { source(s_udp); source(s_tcp); filter(f_mon_useless); parser(p_mon); filter(f_class_security); rewrite(r_set_format_parsed); rewrite(r_set_type); destination(d_normal);
rewrite(r_set_format_welf); rewrite(r_add_welf_raw); destination(d_welf_loghive_sp2); destination(d_welf); };
# unparsed log { source(s_udp); source(s_tcp); filter(f_mon_useless); # XXX: problem disappears when this one is commented out # parser(p_mon); filter(f_class_unknown); rewrite(r_set_format_unparsed); rewrite(r_set_type); destination(d_normal); };
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init. When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime. When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted. Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again. Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB. Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not. If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted. Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value. We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init. This goes a few frames up, until we exit with an error code. I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get. Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail. Regards, Matthew. static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s; self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st; if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); } self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
Untested patch: --- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } } self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); Matthew. On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
All that troubleshooting just for a couple of lines of code. Doesnt that make you nuts sometimes? :-P -Patrick Sent: Thu Dec 16 2010 21:20:44 GMT-0700 (Mountain Standard Time) From: Matthew Hall <mhall@mhcomputing.net> To: Syslog-ng users' and developers' mailing list <syslog-ng@lists.balabit.hu> Subject: Re: [syslog-ng] load failures in afsocket and afsql
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Thu, Dec 16, 2010 at 09:25:17PM -0700, Patrick H. wrote:
All that troubleshooting just for a couple of lines of code. Doesnt that make you nuts sometimes? :-P
You have NO idea!!! I hope somebody from Balabit has time to think about it and reply. Matthew.
I think this patch might well fix it. I tried it with the patch just now and it seemed to launch OK. Matthew. On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
Congrats! Go do some gambling because your karma has to be extremely positive at this point! On Thu, Dec 16, 2010 at 10:32 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I think this patch might well fix it.
I tried it with the patch just now and it seemed to launch OK.
Matthew.
On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Thu, 2010-12-16 at 23:26 -0600, Martin Holste wrote:
Congrats! Go do some gambling because your karma has to be extremely positive at this point!
On Thu, Dec 16, 2010 at 10:32 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I think this patch might well fix it.
I tried it with the patch just now and it seemed to launch OK.
Matthew.
On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
Getting closer.
The problem goes away when the XML pattern DB is disabled. The problem does not appear if the XML pattern db is used in one log {} per below. Once it is used in two log {} blocks, KABOOM! I'm going to try debugging the other half of this which writes to the persistent store to see if I can sort out what's breaking the write. For what it's worth, the problem shows up in both 32 and 64 bit.
Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
First of all thanks for tracking this down, and sorry that 1) there's a bug in the first place 2) I wasn't here to help out. But anyway, reading through code is always a useful experience, you'll be able to resolve much more complex issues next time. :) About the patch: I'm afraid it is not the correct solution, and I don't see an easy way forward, since you're using db-parser() in a way that I planned to explicitly restrict, and in my 3.3 codebase there's an explicit check that the same db-parser() instance cannot be used multiple times, but that has not (yet) trickled to 3.2. But maybe we can come up with a better solution than my "restriction", since I thought that it can be reasonably said that only one db-parser() instance is needed in a configuration. But anyway, here's the root cause: 1) starting with 3.2 we do have correllation capabilities with db-parser(), which also sports a feature to generate new messages. 2) I intend these messages to "appear" from the db-parser() instance, e.g. they wouldn't be reposted from the internal() source as in 3.2, but would original from the db-parser() where it is referenced from a log path. 3) this means that a single db-parser() instance can have only _one_ outgoing side, or otherwise I wouldn't know where to post messages. The reason the db-parser() saves something to the "persist" store is that it has to save the correllation state accross SIGHUPs, so that a reload doesn't empty your complete correllation state. Using the same state in two independent db-parser() instances is probably not what you want. I'll check you I could fix your use-case, but I need some thinking time. I just wanted to drop this mail so that you know that I'm alive, I was just distracted by other things. Also, I appreciate your time you spent on diagnosing this issue. If you spotted anything in the code that could have helped your diagnosis (e.g. missing code comments) I'd be happy to get some feedback. Thanks. -- Bazsi
On Fri, 2010-12-17 at 14:51 +0100, Balazs Scheidler wrote:
On Thu, 2010-12-16 at 23:26 -0600, Martin Holste wrote:
Congrats! Go do some gambling because your karma has to be extremely positive at this point!
On Thu, Dec 16, 2010 at 10:32 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I think this patch might well fix it.
I tried it with the patch just now and it seemed to launch OK.
Matthew.
On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote:
Keep fighting the good fight, hopefully you'll get some hints soon. You're well outside the range of my debugging-fu at this point.
On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote: > Getting closer. > > The problem goes away when the XML pattern DB is disabled. The problem > does not appear if the XML pattern db is used in one log {} per below. > Once it is used in two log {} blocks, KABOOM! I'm going to try debugging > the other half of this which writes to the persistent store to see if I > can sort out what's breaking the write. For what it's worth, the problem > shows up in both 32 and 64 bit. > > Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
First of all thanks for tracking this down, and sorry that
1) there's a bug in the first place 2) I wasn't here to help out.
But anyway, reading through code is always a useful experience, you'll be able to resolve much more complex issues next time. :)
About the patch:
I'm afraid it is not the correct solution, and I don't see an easy way forward, since you're using db-parser() in a way that I planned to explicitly restrict, and in my 3.3 codebase there's an explicit check that the same db-parser() instance cannot be used multiple times, but that has not (yet) trickled to 3.2.
But maybe we can come up with a better solution than my "restriction", since I thought that it can be reasonably said that only one db-parser() instance is needed in a configuration.
But anyway, here's the root cause:
1) starting with 3.2 we do have correllation capabilities with db-parser(), which also sports a feature to generate new messages.
2) I intend these messages to "appear" from the db-parser() instance, e.g. they wouldn't be reposted from the internal() source as in 3.2, but would original from the db-parser() where it is referenced from a log path.
3) this means that a single db-parser() instance can have only _one_ outgoing side, or otherwise I wouldn't know where to post messages.
The reason the db-parser() saves something to the "persist" store is that it has to save the correllation state accross SIGHUPs, so that a reload doesn't empty your complete correllation state.
Using the same state in two independent db-parser() instances is probably not what you want.
I'll check you I could fix your use-case, but I need some thinking time.
I just wanted to drop this mail so that you know that I'm alive, I was just distracted by other things.
Also, I appreciate your time you spent on diagnosing this issue. If you spotted anything in the code that could have helped your diagnosis (e.g. missing code comments) I'd be happy to get some feedback.
Thanks.
And here comes the patch: commit 5f25ce47ab57774f8fe8df17bc96a006e535cb53 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Fri Dec 17 15:38:00 2010 +0100 dbparser: don't initialize the same db-parser() instance multiple times Even if the configuration references the same db-parser() from several locations. This caused a difficult to understand config initialization problem in syslog-ng. NOTE: the current behaviour of the db-parsers() correllation state is going to change in 3.3. In 3.2, the state is shared in this case, in 3.3 each reference will use a private, independent state. Reported-By: Matthew Hall Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> I don't have a solution for 3.3 yet, but I've added an item to my todo list about that. Thanks again for tracking this down. -- Bazsi
Ok, that all makes sense. So what happens if you instantiate two db-parser instances which both refer to the same patterndb XML file. That should work, right? Something like; parser p_db_1 { db-parser(file("patterndb.xml")); }; parser p_db_2 { db-parser(file("patterndb.xml")): }; log { source(s_first); parser(p_db_1); destination(d_first); }; log { source(s_second); parser(p_db_2); destination(d_second); }; On Fri, Dec 17, 2010 at 8:39 AM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Fri, 2010-12-17 at 14:51 +0100, Balazs Scheidler wrote:
On Thu, 2010-12-16 at 23:26 -0600, Martin Holste wrote:
Congrats! Go do some gambling because your karma has to be extremely positive at this point!
On Thu, Dec 16, 2010 at 10:32 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
I think this patch might well fix it.
I tried it with the patch just now and it seemed to launch OK.
Matthew.
On Thu, Dec 16, 2010 at 08:20:44PM -0800, Matthew Hall wrote:
Untested patch:
--- /home/megahall/dbparser.c 2010-12-16 20:07:03.000000000 -0800 +++ syslog-ng-3.2.1/modules/dbparser/dbparser.c 2010-12-16 20:14:59.000000000 -0800 @@ -140,6 +140,11 @@ else { log_db_parser_reload_database(self); + /* XXX: mhall: repair corruption of persistent config */ + if (self->db) + { + cfg_persist_config_add(cfg, log_db_parser_format_persist_name(self), self->db, (GDestroyNotify) pattern_db_free, FALSE); + } }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self);
Matthew.
On Thu, Dec 16, 2010 at 08:06:43PM -0800, Matthew Hall wrote:
On Thu, Dec 16, 2010 at 04:55:09PM -0600, Martin Holste wrote: > Keep fighting the good fight, hopefully you'll get some hints soon. > You're well outside the range of my debugging-fu at this point. > > On Thu, Dec 16, 2010 at 3:34 PM, Matthew Hall <mhall@mhcomputing.net> wrote: > > Getting closer. > > > > The problem goes away when the XML pattern DB is disabled. The problem > > does not appear if the XML pattern db is used in one log {} per below. > > Once it is used in two log {} blocks, KABOOM! I'm going to try debugging > > the other half of this which writes to the persistent store to see if I > > can sort out what's breaking the write. For what it's worth, the problem > > shows up in both 32 and 64 bit. > > > > Matthew.
I think the problem could be found now. In the log_db_parser_deinit, there is a call to cfg_persist_config_add but there is no corresponding call in log_db_parser_init.
When the db_parser is referenced once in the config file, self->db is NULL, so log_db_parser_reload_database is called to create the right data structure. It's important to remember that this is going to set self->db_file_inode and self->db_file_mtime.
When the db_parser is referenced again in the config file, self->db should be non NULL because the db_parser was supposed to be persisted.
Bug 1) When we call cfg_persist_config_fetch but we get NULL again so we call log_db_parser_reload_database again.
Bug 2) (Unrelated to my issue) Even if we had stored the db_parser, if we call stat, we just copy the new inode and mtime, and do not reload the patterns. So configuration reloads will probably not refresh the pattern DB.
Now in my case when we go into log_db_parser_reload_database the second time, we have a check if the DB file exists. If no we have an error. Fair enough. But if yes, then we check if self->db_file_inode and self->sb_file_mtime have changed, or not.
If they have not changed, we return right away, without initializing the self->db. But we have already destroyed the valid self->db pointer from the first initialization, by replacing it with the retval from cfg_persist_config_fetch, which was NULL because the config was not persisted.
Now we check self->db again at the end of log_db_parser_init where we find it has become NULL. This we return a failing retval because we never suceeded in initializing the log_db_parser to a non-NULL value.
We pass this error many frames up the stack until we hit the "Error initializing message pipeline" in log_center_init.
This goes a few frames up, until we exit with an error code.
I think the patch would be adding code to log_db_parser_init and/or log_db_parser_reload_database, which calls cfg_persist_config_add. I am going to try this Monday since I'm off tomorrow and see what I get.
Could somebody else try making a config which references the EXACT SAME patterndb file twice in two log {} blocks and see if it blows up for them as well? I want to try to eliminate as many environment specific issues as possible. I described the config you need in my previous mail.
Regards, Matthew.
static gboolean log_db_parser_init(LogParser *s, GlobalConfig *cfg) { LogDBParser *self = (LogDBParser *) s;
self->db = cfg_persist_config_fetch(cfg, ***log_db_parser_format_persist_name***(self)); if (self->db) { struct stat st;
if (stat(self->db_file, &st) < 0) { msg_error("Error stating pattern database file, no automatic reload will be performed", evt_tag_str("error", g_strerror(errno)), NULL); } else { self->db_file_inode = st.st_ino; self->db_file_mtime = st.st_mtime; } } else { log_db_parser_reload_database(self); }
self->timer_tick_id = g_timeout_add_seconds(1, log_db_parser_timer_tick, self); return self->db != NULL; }
First of all thanks for tracking this down, and sorry that
1) there's a bug in the first place 2) I wasn't here to help out.
But anyway, reading through code is always a useful experience, you'll be able to resolve much more complex issues next time. :)
About the patch:
I'm afraid it is not the correct solution, and I don't see an easy way forward, since you're using db-parser() in a way that I planned to explicitly restrict, and in my 3.3 codebase there's an explicit check that the same db-parser() instance cannot be used multiple times, but that has not (yet) trickled to 3.2.
But maybe we can come up with a better solution than my "restriction", since I thought that it can be reasonably said that only one db-parser() instance is needed in a configuration.
But anyway, here's the root cause:
1) starting with 3.2 we do have correllation capabilities with db-parser(), which also sports a feature to generate new messages.
2) I intend these messages to "appear" from the db-parser() instance, e.g. they wouldn't be reposted from the internal() source as in 3.2, but would original from the db-parser() where it is referenced from a log path.
3) this means that a single db-parser() instance can have only _one_ outgoing side, or otherwise I wouldn't know where to post messages.
The reason the db-parser() saves something to the "persist" store is that it has to save the correllation state accross SIGHUPs, so that a reload doesn't empty your complete correllation state.
Using the same state in two independent db-parser() instances is probably not what you want.
I'll check you I could fix your use-case, but I need some thinking time.
I just wanted to drop this mail so that you know that I'm alive, I was just distracted by other things.
Also, I appreciate your time you spent on diagnosing this issue. If you spotted anything in the code that could have helped your diagnosis (e.g. missing code comments) I'd be happy to get some feedback.
Thanks.
And here comes the patch:
commit 5f25ce47ab57774f8fe8df17bc96a006e535cb53 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Fri Dec 17 15:38:00 2010 +0100
dbparser: don't initialize the same db-parser() instance multiple times
Even if the configuration references the same db-parser() from several locations.
This caused a difficult to understand config initialization problem in syslog-ng.
NOTE: the current behaviour of the db-parsers() correllation state is going to change in 3.3. In 3.2, the state is shared in this case, in 3.3 each reference will use a private, independent state.
Reported-By: Matthew Hall Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
I don't have a solution for 3.3 yet, but I've added an item to my todo list about that.
Thanks again for tracking this down.
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
On Fri, 2010-12-17 at 09:18 -0600, Martin Holste wrote:
Ok, that all makes sense. So what happens if you instantiate two db-parser instances which both refer to the same patterndb XML file. That should work, right? Something like;
parser p_db_1 { db-parser(file("patterndb.xml")); }; parser p_db_2 { db-parser(file("patterndb.xml")): }; log { source(s_first); parser(p_db_1); destination(d_first); }; log { source(s_second); parser(p_db_2); destination(d_second); };
Yes. this is fine. in fact, I've found out how to do this properly in 3.3 with the same effect, but without having to duplicate the parser lines. -- Bazsi
participants (5)
-
Balazs Scheidler
-
Martin Holste
-
Matthew Hall
-
Patrick H.
-
Sandor Geller