Hi, I just wanted to let you know that I can pull your patches any time, but I'd like an explicit "pull" request with a proper repo/branch information to tell me that it is a state which can be pulled. Also, I before pulling I'd like to ask you to * fold related changes into a single patch in its final form (e.g. I wouldn't want to add the old mongo-c-client based implementation if possible), * please split off the patch on layers (e.g. one for mongo client lib, and one for the mongodb destination, it probably doesn't make sense to separate the syslog-ng plugin glue code, but that could come 3rd). * please remember to sign off each individual patch And now some review comments (I did the review on your HEAD, e.g. not doing a patch-by-patch review). * some of my comments would apply to the SQL driver as well, it is somewhat in a need of cleanup, as my focus during 3.3 development was more the traditional input/output plugins. These are not release critical (e.g. the code could still be pulled) I'm marking the comments below if this is the case and in case a better solution is found for MongoDB, a similar approch should be taken in SQL as well. * in syslog-ng 3.3, LogQueue is taking care of its own locking, so no need to separately lock it, at least for _push/pop operations. I see that you are using SQL-like wakeup mechanism, which requires you to know the length of the queue atomicaly, which still requires this external lock. A better solution would probably be to use log_queue_check_items() API and standard ivykis threads for the writer [applies to SQL, not RC] * when naming/defining mutexes, please try to name it according to the data it protects, and not code (writer_thread_mutex should probably be named something else. In SQL this is using a similar name, but that has historic roots). [applies to SQL, not RC] * afmongodb_dd_free() doesn't free keys/values [RC] * start/stop_thread() creates/frees mutexes/condvars, is there a reason not to do this once in dd_new() and free them in free() ? [RC] * is there a reason mongodb_dd_connect() is protected by a mutex? it is only protected from one code path, but it is only called from the writer thread, so it should be single threaded anyway. [RC] * worker_insert(): * it queries the time for each invocation. time(NULL) is dreadfully slow and we're trying to phase it out, especially in slow paths. cached_g_current_time_sec() should be used instead. [RC] * [oid generation] srand() should not be called here, if you absolutely must rely on srand/rand, then srand() should be called once at startup. maybe it would be better to use RAND_bytes(), although that would pull in a libcrypto as a requirement for mongodb as well (non-RC if well hidden behind a function) * [value, coll variables] if at all possible, please reuse GString based buffers in @self, especially if you don't need to protect them via a mutex. allocation heavy code can really affect performance (and although with SQL the numbers are quite low, the numbers you mentioned with mongodb, this may not be the case). This may apply to bson_new_sized() too (e.g. reuse the bson object header after a call similar to g_string_truncate(0)) (non-RC, but preferable) * [formatting collection name] g_string_prepend() is slow, add the literal to the beginning, and then use log_template_append_format() to append to the end (no need to move strings around). [RC] * [av_foreach_buffer] please don't define a struct for this as its name is just not covering what it does, I usually use an array of pointers in this case. [non-RC] * [dynamic values] the discussion about globbing on the name-value pairs should probably be closed before the autovalues stuff can go in. [RC] * bson stuff: looks nice. you mentioned you had some unit tests to cover this, can you add that to modules/afmongodb/tests ? * mongo-client.c: * perhaps a mongo state struct wouldn't hurt even if currently it would be an fd there. * mongo_packet_send(): perhaps you could use writev() to combine the header/data parts, instead of sending them in two chunks and two syscall roundtrips * mongo-wire.c: * struct _mongo_packet, please don't use #pragma there, just add a comment that new fields should be added with care. You perfectly filled all padding bytes with fields anyway. * perhaps using GByteArray here is an overkill, you size the array by calculating the required space anyway, this way you could hang the data array right the end of struct mongodb_packet. [non-RC] * wouldn't there be a way to avoid copying bson structs around? e.g. you currently [non-RC, just food for thought): * build 3 BSON objects (selector, update, set), possibly containing further nested because of subdocuments * then build a mongodb_packet which again copies the whole structure, essentially building everything and then copying everything again. * perhaps it'd be possible to reference the bson objects directly from the mongo_packet where they should be embedded? and then mongo_packet_send() could use writev() to send out the whole thing, without having to copy them to a linear buffer? Even though this message is lengthy, I really like the code here, and would put that into my tree immediately. But I let you do the stuff still open and to take the fame for your efforts :) On Sat, 2011-01-15 at 00:34 +0100, Gergely Nagy wrote:
It's been a while I spam^Wnotified the list with mongodb updates, so the recent news from the past few days are as follows:
* The syslog-ng 3.2 version is abandoned, obsolete. * The driver has been ported to 3.3, and the database writer was split out into a separate thread. * The underlying mongodb connector was replaced: threw out the former, and implemented my own (for various reasons, which are too numerous to list here). The new mongodb connector will be maintained as a separate project, but a version of it will be embedded into syslog-ng for convenience's sake. The canonical repo for it is over at github: https://github.com/algernon/libmongo-client
While this might not seem much, the above changes brought a few good side effects:
* If we loose connection to MongoDB, for one reason or the other, the driver does not drop messages immediately anymore: we have a queue (configurable via log_fifo_size()). * Database writes are in a separate thread, thus, network latency does not affect the speed at which the driver can accept messages. * Due to the new underlying connector, we handle error cases a lot better. As in, we don't throw up and abort(). Though, error handling still has to improve, but the driver's in a much better state. * There's support for $SEQNUM in templates now. Though I can't really imagine a situation where I could use it, but I needed them for ObjectID generation anyway, so why not expose them to the templates aswell?
However, there's one downside too: we lost authentication support (temporarily, until I re-implement that in the connector).
The speed is about the same as 3.2's, threading didn't improve our performance (well, it did, if we consider network latency, but my tests are local) - I didn't expect that to improve to begin with.
I'm actually surprised that despite changing the connector library, and porting over to threads, the driver still maintained its speed.
My project page should be (mostly) up to date, but for the adventurous:
git clone -b modules/afmongodb git://git.madhouse-project.org/syslog-ng/syslog-ng-3.3.git
The modules/afmongodb branch will hopefully be quiet during the weekend.
-- Bazsi