[syslog-ng] MongoDB destination driver

Sat Jan 1 21:18:01 CET 2011

> We should also point out that grabbing these kinds of locks and making
> these kinds of manipulations should be done as part of careful planning
> since it can render the table inaccessible for long-ish periods through
> normal means such as queries and could require some potentially time
> intensive index rebuilding since indexing is turned off during some of
> these manipulations. (Not sure what percentage of this applies to
> MongoDB since it's a bit unique).
>

For instance, using "LOAD DATA CONCURRENT INFILE" will allow reads to
occur while doing the bulk imports in MySQL.  The manual says there is
a slight performance hit, but it is unnoticeable in my experience.  I
haven't tested to see what actual locking occurs during mongoimport.

> Perhaps it would be good if we could work together (several of us have
> been experimenting with optimum buffering, database and index setups,
> etc.) to figure out what the best practices are in terms of initial
> storage, indexing, retention, archiving, etc.
>

Absolutely.  The biggest challenge I've come across is how to properly
do archiving.  I've been using the ARCHIVE storage engine in MySQL
because the compact row format actually compresses blocks of rows, not
columnar data, giving you a 10:1 (or more) compression ratio on log
data while still maintaining all of the meta data.  The main drawback
is that the archive storage engine is poorly documented: specifically,
if MySQL crashes while an archive table is open, it will mark that
table as crashed and rebuild the entire table on startup.  It will
usually have to do this for all archive tables under normal operation,
which means that time to recover is on the order of many hours on even
modest number of tables.  There is no (documented) way to configured
this or to change the table status, since it's not actually "marked"
crashed.

Then there's the challenge of performing the conversion from normal
log table to compressed log table.  I found that it takes so long to
compress large tables that it's better just to record everything
twice: once to the short-term, uncompressed tables, once to the
compressed tables.  Obviously, that situation is non-optimal, and I am
all for suggestions as to how bulk data should be handled and welcome
discussions on the topic.