Most of the mentioned issues are fixed in git, one way or the other (details below). However, I'm seeing some weird problems from time to time... Namely, when I shut down the mongodb server to test reconnect and perhaps a few other things: with the mongodb server down, when I do an msg_error() from the writer thread, syslog-ng ends up segfaulting, due to indirectly calling cached_g_current_time() (via the log_msg_init()) - which, as far as I experienced so far, shouldn't be called from anywhere but the main thread. I'm not quite sure how to get around that, except for not generating messages in the worker thread, which would be a bit painful. (This issue was present for a while, I think, will check if I can bisect it) I probably screwed up something around the mutexes too, as I threw loggen at syslog-ng and after a while it just stopped, probably in a neat little deadlock. Current status as of this writing:
* start/stop_thread() creates/frees mutexes/condvars, is there a reason not to do this once in dd_new() and free them in free() ? [RC]
No reason that I can think of, will move them over.
Moved the mutex/condvar stuff to dd_init() and dd_deinit() respectively. The reason they weren't moved to _new/_free() is that the worker thread depends on stuff that is set up by dd_init(), so it can't start at dd_new() time. I might be missing something though, in which case please bonk me on the head. (One option would be to have a wakeup cond for the thread: I would set up the cond vars and mutexes from _new(), but only issue the wakeup from _init() - problem solved. At the moment I'm having a few issues grasping the process of module starting & shutdown. I'll have another go at it tomorrow.)
* is there a reason mongodb_dd_connect() is protected by a mutex? it is only protected from one code path, but it is only called from the writer thread, so it should be single threaded anyway. [RC]
Good catch, will fix, thanks!
Fixed in git.
* worker_insert(): * it queries the time for each invocation. time(NULL) is dreadfully slow and we're trying to phase it out, especially in slow paths. cached_g_current_time_sec() should be used instead. [RC]
Partially-fixed: The OID generation was moved to the mongo client library. Though, that still calls time() for now. I'll add a way to supply our own time, and then the mongodb destination can pass the cached time.
Fixed in git. I added a self->last_msg_stamp, which is updated under a lock in dd_queue(), and passed to the OID generator in the writer thread.
* [oid generation] srand() should not be called here, if you
absolutely must rely on srand/rand, then srand() should be called once at startup. maybe it would be better to use RAND_bytes(), although that would pull in a libcrypto as a requirement for mongodb as well (non-RC if well hidden behind a function)
This is partially fixed too, as OID generation was moved to the mongo client (I just haven't pushed those changes yet, wanted to keep my HEAD stable for a few days).
The rand() is used to generate the machine ID... I think I have an acceptable way to solve that.
Fixed in git. The client library now has a mongo_util_oid_init() method which will set up the machine id and the pid. This is called from dd_new(), neither srand, nor rand, nor getpid, nor time gets called from afmongodb anymore. And getpid/srand/rand only gets indirectly called one time during dd_new().
* [formatting collection name] g_string_prepend() is slow, add the literal to the beginning, and then use log_template_append_format() to append to the end (no need to move strings around). [RC]
D'oh. Suits me for not checking the headers: didn't know about log_template_append_format().
Fixed in git.
* mongo-client.c:
* perhaps a mongo state struct wouldn't hurt even if currently it would be an fd there.
Partially fixed: it's on my TODO list for the client lib for today.
Fixed in git.
* mongo_packet_send(): perhaps you could use writev() to combine the
header/data parts, instead of sending them in two chunks and two syscall roundtrips
Was on my TODO list, it climbed a bit higher.
Fixed in git.
* mongo-wire.c:
* struct _mongo_packet, please don't use #pragma there, just add a comment that new fields should be added with care. You perfectly filled all padding bytes with fields anyway.
Noted, will fix.
Fixed in git. Though, the client library grew another struct (which is not used by syslog-ng yet) which still uses #pragma, to avoid padding. I'll see if I can get around that easily.
* perhaps using GByteArray here is an overkill, you size the array by
calculating the required space anyway, this way you could hang the data array right the end of struct mongodb_packet. [non-RC]
Noted, will fix.
Fixed in git.
* [av_foreach_buffer] please don't define a struct for this as its
name is just not covering what it does, I usually use an array of pointers in this case. [non-RC]
Noted, will fix.
Fixed in git.
* [value, coll variables] if at all possible, please reuse GString based buffers in @self, especially if you don't need to protect them via a mutex. allocation heavy code can really affect performance (and although with SQL the numbers are quite low, the numbers you mentioned with mongodb, this may not be the case). This may apply to bson_new_sized() too (e.g. reuse the bson object header after a call similar to g_string_truncate(0)) (non-RC, but preferable)
Noted, will look into it.
value & coll moved to self (as self->current_value and self->current_namespace), they're allocated by the worker thread upon startup, and freed when shutting down. And since the "db." prefix doesn't change, ever, I also store the namespace prefix length in self, and truncate to that point during insert. The bson objects will be moved there too later tonight - I will have to update the client lib for that, and am knee-deep in syslog-ng at the moment, and don't want to switch out.
* when naming/defining mutexes, please try to name it according to the data it protects, and not code (writer_thread_mutex should probably be named something else. In SQL this is using a similar name, but that has historic roots). [applies to SQL, not RC]
Ok.
(Partly) fixed in git. I removed the writer_thread_mutex, and added suspend_mutex (held when checking/setting suspend state) and queue_mutex (held when checking queue length). Also got rid of explicit mutex locking around log_queue_push & _pop calls. Once I figure out how to use log_queue_check_items() and ivykis threads, queue_mutex will be gone too.
* in syslog-ng 3.3, LogQueue is taking care of its own locking, so no need to separately lock it, at least for _push/pop operations. I see that you are using SQL-like wakeup mechanism, which requires you to know the length of the queue atomicaly, which still requires this external lock. A better solution would probably be to use log_queue_check_items() API and standard ivykis threads for the writer [applies to SQL, not RC]
Noted, I'll have a closer look at ivykis.
I think I have a rough idea how this should work. If all goes well, I'll push this out tonight aswell.
* afmongodb_dd_free() doesn't free keys/values [RC]
Off the top of my head, I don't see which key/values should be free'd there: self->fields is cleaned up as far as I see.
Haven't looked further, I'll have a look at it tomorrow. -- |8]