[syslog-ng] mojology: syslog-ng and mongodb meet the web

Gergely Nagy algernon at balabit.hu
Sun Jan 9 11:22:39 CET 2011


On Sat, 2011-01-08 at 20:41 -0600, Martin Holste wrote: 
> Ahahaha that is awesome!  Search will actually be really easy since
> you can index on anything in there.  I think what would work best for
> full-text search in mojology (doesn't roll off my tongue, but whatever
> fuels your passion...)

Think of it is a "monology", with a j instead of n.

> is to have an optional second process that goes
> through newly inserted logs and does an in-place update.  So if a log
> entry starts with:
> 
> { _id: ...
>   "timestamp": ...
>   "dyn": { "classifier":
>     "class": "some class"
>   },
>   "msg": "hello, world, this is a test",
> }
> 
> Then do something like this to update it:
> db.getCollection("logs").update({"timestamp": { $gt: <date last
> fulltext indexed>, $lt: <now> }}, { $set: { "fulltext":
> msg.split(/\s+/) }}, true);
> 
> Which adds the fulltext column to yield:
> { _id: ...
>   "timestamp": ...
>   "dyn": { "classifier":
>     "class": "some class"
>   },
>   "msg": "hello, world, this is a test",
>   "fulltext": [ "hello", "world", "this", "is", "a", "test" ]
> }
> 
> I'm a little shaky on the Mongo update code there, but you get the
> idea.  The point is that since it would be an optional second-pass, it
> would be easy to tune or eliminate for performance.  If you do
> ensureIndex("dyn") and ensureIndex("fulltext") then you have pretty
> much all of your searching-bases covered.  You could of course add
> this as an option to your Mongo Syslog-NG driver to do the split when
> the original insert occurs for better overall performance and less
> database fragmentation, but there would be a significantly higher
> insert time.

I was considering something lke that (and a few other things, that would
involve updating the db), and my current idea is to use a separate
collection instead, so that if the original collection is, say, a capped
collection, we don't unnecessarily add extra burden to it. That, and
updating has a reasonable chance of fragmenting the document on-disk...

So instead, I'll see if I can use a $Docref (or whatever that is
called). That would make mojology a little slower, but it wouldn't need
to touch the source collection at all.

I didn't think about fulltext search though, so thanks for the
suggestion!

-- 
|8]





More information about the syslog-ng mailing list