[syslog-ng] mojology: syslog-ng and mongodb meet the web
Gergely Nagy
algernon at balabit.hu
Sun Jan 9 11:22:39 CET 2011
On Sat, 2011-01-08 at 20:41 -0600, Martin Holste wrote:
> Ahahaha that is awesome! Search will actually be really easy since
> you can index on anything in there. I think what would work best for
> full-text search in mojology (doesn't roll off my tongue, but whatever
> fuels your passion...)
Think of it is a "monology", with a j instead of n.
> is to have an optional second process that goes
> through newly inserted logs and does an in-place update. So if a log
> entry starts with:
>
> { _id: ...
> "timestamp": ...
> "dyn": { "classifier":
> "class": "some class"
> },
> "msg": "hello, world, this is a test",
> }
>
> Then do something like this to update it:
> db.getCollection("logs").update({"timestamp": { $gt: <date last
> fulltext indexed>, $lt: <now> }}, { $set: { "fulltext":
> msg.split(/\s+/) }}, true);
>
> Which adds the fulltext column to yield:
> { _id: ...
> "timestamp": ...
> "dyn": { "classifier":
> "class": "some class"
> },
> "msg": "hello, world, this is a test",
> "fulltext": [ "hello", "world", "this", "is", "a", "test" ]
> }
>
> I'm a little shaky on the Mongo update code there, but you get the
> idea. The point is that since it would be an optional second-pass, it
> would be easy to tune or eliminate for performance. If you do
> ensureIndex("dyn") and ensureIndex("fulltext") then you have pretty
> much all of your searching-bases covered. You could of course add
> this as an option to your Mongo Syslog-NG driver to do the split when
> the original insert occurs for better overall performance and less
> database fragmentation, but there would be a significantly higher
> insert time.
I was considering something lke that (and a few other things, that would
involve updating the db), and my current idea is to use a separate
collection instead, so that if the original collection is, say, a capped
collection, we don't unnecessarily add extra burden to it. That, and
updating has a reasonable chance of fragmenting the document on-disk...
So instead, I'll see if I can use a $Docref (or whatever that is
called). That would make mojology a little slower, but it wouldn't need
to touch the source collection at all.
I didn't think about fulltext search though, so thanks for the
suggestion!
--
|8]
More information about the syslog-ng
mailing list