[RFC]: MongoDB destination plans (for 3.4 and beyond)
Hi! I've been working on a few mongodb destination related features recently, and I thought I'll ask for comments here, before I proceed further, to see if my ideas can be improved, and if there's anyone actually interested in the stuff I play with (mostly out of sheer curiosity; I'll scratch my own itches even if noone else has similar needs :P). Before I go further, let me introduce the current mongodb destination features, available in the syslog-ng 3.3 branch: * We can connect to a single MongoDB server per destination, which we expect to be the master (or a standalone server). * We can send logs to it, structured in various interesting ways via value-pairs(). ...aaand that's about it. Pretty bare bones, but it's enough for a lot of stuff. Now, I plan to extend this in two ways: first, it will be possible to connect to a ReplicaSet, which is basically a set of MongoDB servers that replicate from a master. The advantage of this, is that if the master goes down, one of the secondaries will automatically take over, and the mongodb driver will reconnect to it automatically aswell. In case that fails too, the destination driver will fall back to queuing within syslog-ng, and retry after a configured interval. Another advantage will be the use of "safe mode", which when turned on, will verify that the message could be inserted into a MongoDB collection, and it will not be ACKed on the syslog-ng side until it is. After a number of retries, it can be dropped along with an appropriate log message, so that these won't fill up the queue. (This, of course, would be optional, with never dropping messages by default, unless the internal queue is full.) So we could have a destination configured like this: d_mongo { mongodb( servers("10.0.10.1:27017" "10.0.10.2:27017" "10.0.10.3:27017") safe-mode(on) ); }; This would connect to 10.0.10.1 on port 27017 by default, and if that becomes inaccessible, it'd retry with .2 and .3, in that order. If any of them listed other servers as part of their replicaset, the driver would retry with those aswell. It would also turn on safe-mode, which are a bunch of extra checks to ensure that data arrived to MongoDB safe and sound. We'd get more reliability this way, and with safe-mode, better data safety aswell. Now, the question is: is there anything else that may be worth adding? If anyone here used MongoDB, is there something else you'd like to see added to the mongodb destination? (I could add GridFS support aswell, but so far, I haven't found an acceptable use-case for that yet.) -- |8]
Good work! That's a solid start. For the future, I guess I would ask this: are you taking advantage of Mongo's ability to have arrays for values? That's one of the biggest features of NoSQL--you can have multiple values in an array under one key. The best example is a poor-man's full text search in which you do a split() on whitespace in your app (syslog-ng in this case) and put the values in an array in Mongo. If the array is indexed, then all of the values are automatically indexed, giving you the ability to find any word almost instantly. Another use would be for the current tags feature in pattern-db, which is a more natural fit. On Tue, Jun 21, 2011 at 9:08 AM, Gergely Nagy <algernon@balabit.hu> wrote:
Hi!
I've been working on a few mongodb destination related features recently, and I thought I'll ask for comments here, before I proceed further, to see if my ideas can be improved, and if there's anyone actually interested in the stuff I play with (mostly out of sheer curiosity; I'll scratch my own itches even if noone else has similar needs :P).
Before I go further, let me introduce the current mongodb destination features, available in the syslog-ng 3.3 branch:
* We can connect to a single MongoDB server per destination, which we expect to be the master (or a standalone server). * We can send logs to it, structured in various interesting ways via value-pairs().
...aaand that's about it. Pretty bare bones, but it's enough for a lot of stuff.
Now, I plan to extend this in two ways: first, it will be possible to connect to a ReplicaSet, which is basically a set of MongoDB servers that replicate from a master. The advantage of this, is that if the master goes down, one of the secondaries will automatically take over, and the mongodb driver will reconnect to it automatically aswell. In case that fails too, the destination driver will fall back to queuing within syslog-ng, and retry after a configured interval.
Another advantage will be the use of "safe mode", which when turned on, will verify that the message could be inserted into a MongoDB collection, and it will not be ACKed on the syslog-ng side until it is. After a number of retries, it can be dropped along with an appropriate log message, so that these won't fill up the queue. (This, of course, would be optional, with never dropping messages by default, unless the internal queue is full.)
So we could have a destination configured like this:
d_mongo { mongodb( servers("10.0.10.1:27017" "10.0.10.2:27017" "10.0.10.3:27017") safe-mode(on) ); };
This would connect to 10.0.10.1 on port 27017 by default, and if that becomes inaccessible, it'd retry with .2 and .3, in that order. If any of them listed other servers as part of their replicaset, the driver would retry with those aswell. It would also turn on safe-mode, which are a bunch of extra checks to ensure that data arrived to MongoDB safe and sound.
We'd get more reliability this way, and with safe-mode, better data safety aswell.
Now, the question is: is there anything else that may be worth adding? If anyone here used MongoDB, is there something else you'd like to see added to the mongodb destination?
(I could add GridFS support aswell, but so far, I haven't found an acceptable use-case for that yet.)
-- |8]
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
Martin Holste <mcholste@gmail.com> writes:
Good work! That's a solid start. For the future, I guess I would ask this: are you taking advantage of Mongo's ability to have arrays for values? That's one of the biggest features of NoSQL--you can have multiple values in an array under one key.
Nope, not yet. But that's a good idea, and also reminded me that I need to find a way to properly support MongoDB's Date type aswell.
The best example is a poor-man's full text search in which you do a split() on whitespace in your app (syslog-ng in this case) and put the values in an array in Mongo. If the array is indexed, then all of the values are automatically indexed, giving you the ability to find any word almost instantly. Another use would be for the current tags feature in pattern-db, which is a more natural fit.
Sounds good, thank you for the suggestion! -- |8]
participants (2)
-
Gergely Nagy
-
Martin Holste