[syslog-ng] MongoDB destination driver

Martin Holste mcholste at gmail.com
Sat Jan 1 21:24:10 CET 2011


Super cool!  At those rates, I think few will benefit from the bulk
insert benefits, so I'd put that low on the feature priority list,
especially with the opportunity to create bugs with the complexity.
My main feature to add (aside from the two you mentioned already on
the roadmap) would be a way to use the keys from a patterndb database
so that the db and collection in Mongo stay the same, but the key
names change with every patterndb rule.  That's really the big payoff
with Mongo--you don't have to define a rigid schema, so you don't have
to know the column names ahead of time.  That's a big deal considering
that the patterndb can change on the fly.  Being confined to
predefined templates in the config limits the potential.  Bazsi, any
idea how to do this?

On Sat, Jan 1, 2011 at 2:18 PM, Martin Holste <mcholste at gmail.com> wrote:
>> We should also point out that grabbing these kinds of locks and making
>> these kinds of manipulations should be done as part of careful planning
>> since it can render the table inaccessible for long-ish periods through
>> normal means such as queries and could require some potentially time
>> intensive index rebuilding since indexing is turned off during some of
>> these manipulations. (Not sure what percentage of this applies to
>> MongoDB since it's a bit unique).
>>
>
> For instance, using "LOAD DATA CONCURRENT INFILE" will allow reads to
> occur while doing the bulk imports in MySQL.  The manual says there is
> a slight performance hit, but it is unnoticeable in my experience.  I
> haven't tested to see what actual locking occurs during mongoimport.
>
>> Perhaps it would be good if we could work together (several of us have
>> been experimenting with optimum buffering, database and index setups,
>> etc.) to figure out what the best practices are in terms of initial
>> storage, indexing, retention, archiving, etc.
>>
>
> Absolutely.  The biggest challenge I've come across is how to properly
> do archiving.  I've been using the ARCHIVE storage engine in MySQL
> because the compact row format actually compresses blocks of rows, not
> columnar data, giving you a 10:1 (or more) compression ratio on log
> data while still maintaining all of the meta data.  The main drawback
> is that the archive storage engine is poorly documented: specifically,
> if MySQL crashes while an archive table is open, it will mark that
> table as crashed and rebuild the entire table on startup.  It will
> usually have to do this for all archive tables under normal operation,
> which means that time to recover is on the order of many hours on even
> modest number of tables.  There is no (documented) way to configured
> this or to change the table status, since it's not actually "marked"
> crashed.
>
> Then there's the challenge of performing the conversion from normal
> log table to compressed log table.  I found that it takes so long to
> compress large tables that it's better just to record everything
> twice: once to the short-term, uncompressed tables, once to the
> compressed tables.  Obviously, that situation is non-optimal, and I am
> all for suggestions as to how bulk data should be handled and welcome
> discussions on the topic.
>


More information about the syslog-ng mailing list