Blog post on ELSA

older
how to push sulogs/utmpx/wtmpx to...

Martin Holste

28 Mar 2011 28 Mar '11

9:26 p.m.

I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG >= 3.1 and pattern-db at its core. There are a lot of other open-source log collection frameworks out there that are easier to install, such as Logzilla (php-syslog-ng), but if you're trying to log > 1k messages/sec (common in large orgs) and need something GPL licensed, installing ELSA will probably be worth your while. We're using it to index 15k messages/sec with basic hardware. It's currently storing tens of billions logs, and full-text, ad-hoc queries complete in about 1/2 to 2 seconds, including group-by queries on arbitrary fields for reporting. I put a few screenshots and a feature list in the post. The documentation is pretty basic right now, but I'm happy to assist if you run into issues. ELSA is also open to plugin creation, so if you find ELSA useful and create plugins, please let me know and I can add them to the project. Also, patterns for the pattern-db are more than welcome! I've included patterns for Cisco FWSM connections and denies, Snort logs, Windows logs from Eventlog-to-Syslog as well as Snare, and URL's from my httpry wrapper, which is available on the project site as well as in the tarball/source code. Comments and feedback are welcome! Thanks, Martin

Show replies by date

Peter Czanik

29 Mar 29 Mar

7:34 a.m.

Hello, On 03/28/2011 10:26 PM, Martin Holste wrote:

...

Comments and feedback are welcome!

First of all, thank you for the news, it looks like a very interesting project for the syslog-ng community! I'll install it as soon as I have a bit more time and share my experiences. Bye, -- Peter Czanik (CzP) <czanik@balabit.hu> BalaBit IT Security / syslog-ng upstream http://czanik.blogs.balabit.com/

Fekete Robert

8:27 a.m.

Hi Martin, just a quick tip: include a link to your blog post on the project page, the post gives a nice overview of ELSA with some screenshots; which is what most people will be looking for, but is missing from the project page. Regards, Robert On 03/28/2011 10:26 PM, Martin Holste wrote:

...

I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG>= 3.1 and pattern-db at its core. There are a lot of other open-source log collection frameworks out there that are easier to install, such as Logzilla (php-syslog-ng), but if you're trying to log> 1k messages/sec (common in large orgs) and need something GPL licensed, installing ELSA will probably be worth your while. We're using it to index 15k messages/sec with basic hardware. It's currently storing tens of billions logs, and full-text, ad-hoc queries complete in about 1/2 to 2 seconds, including group-by queries on arbitrary fields for reporting. I put a few screenshots and a feature list in the post.

The documentation is pretty basic right now, but I'm happy to assist if you run into issues.

ELSA is also open to plugin creation, so if you find ELSA useful and create plugins, please let me know and I can add them to the project.

Also, patterns for the pattern-db are more than welcome! I've included patterns for Cisco FWSM connections and denies, Snort logs, Windows logs from Eventlog-to-Syslog as well as Snare, and URL's from my httpry wrapper, which is available on the project site as well as in the tarball/source code.

Comments and feedback are welcome!

Thanks,

Martin ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Martin Holste

2:32 p.m.

Good call, done. On Tue, Mar 29, 2011 at 2:27 AM, Fekete Robert <frobert@balabit.hu> wrote:

...

Hi Martin,

just a quick tip: include a link to your blog post on the project page, the post gives a nice overview of ELSA with some screenshots; which is what most people will be looking for, but is missing from the project page.

Regards,

Robert

On 03/28/2011 10:26 PM, Martin Holste wrote:

...
I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG>= 3.1 and pattern-db at its core. There are a lot of other open-source log collection frameworks out there that are easier to install, such as Logzilla (php-syslog-ng), but if you're trying to log> 1k messages/sec (common in large orgs) and need something GPL licensed, installing ELSA will probably be worth your while. We're using it to index 15k messages/sec with basic hardware. It's currently storing tens of billions logs, and full-text, ad-hoc queries complete in about 1/2 to 2 seconds, including group-by queries on arbitrary fields for reporting. I put a few screenshots and a feature list in the post.

The documentation is pretty basic right now, but I'm happy to assist if you run into issues.

ELSA is also open to plugin creation, so if you find ELSA useful and create plugins, please let me know and I can add them to the project.

Also, patterns for the pattern-db are more than welcome! I've included patterns for Cisco FWSM connections and denies, Snort logs, Windows logs from Eventlog-to-Syslog as well as Snare, and URL's from my httpry wrapper, which is available on the project site as well as in the tarball/source code.

Comments and feedback are welcome!

Thanks,

Martin ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Lance Laursen

31 Mar 31 Mar

5:16 a.m.

Looks very cool, thanks! On Tue, Mar 29, 2011 at 9:32 AM, Martin Holste <mcholste@gmail.com> wrote:

...

Good call, done.

On Tue, Mar 29, 2011 at 2:27 AM, Fekete Robert <frobert@balabit.hu> wrote:

...
Hi Martin,

just a quick tip: include a link to your blog post on the project page, the post gives a nice overview of ELSA with some screenshots; which is what most people will be looking for, but is missing from the project page.

Regards,

Robert

On 03/28/2011 10:26 PM, Martin Holste wrote:

...
I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG>= 3.1 and pattern-db at its core. There are a lot of other open-source log collection frameworks out there that are easier to install, such as Logzilla (php-syslog-ng), but if you're trying to log> 1k messages/sec (common in large orgs) and need something GPL licensed, installing ELSA will probably be worth your while. We're using it to index 15k messages/sec with basic hardware. It's currently storing tens of billions logs, and full-text, ad-hoc queries complete in about 1/2 to 2 seconds, including group-by queries on arbitrary fields for reporting. I put a few screenshots and a feature list in the post.

The documentation is pretty basic right now, but I'm happy to assist if you run into issues.

ELSA is also open to plugin creation, so if you find ELSA useful and create plugins, please let me know and I can add them to the project.

Also, patterns for the pattern-db are more than welcome! I've included patterns for Cisco FWSM connections and denies, Snort logs, Windows logs from Eventlog-to-Syslog as well as Snare, and URL's from my httpry wrapper, which is available on the project site as well as in the tarball/source code.

Comments and feedback are welcome!

Thanks,

Martin

...
...
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

______________________________________________________________________________

...
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Hendrik Visage

1 Apr 1 Apr

8:11 a.m.

On Mon, Mar 28, 2011 at 10:26 PM, Martin Holste <mcholste@gmail.com> wrote:

...

I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG >= 3.1 and pattern-db at its core.

How easily could me make it DB independent? Specifically I have the need for PostgreSQL instead of MySQL (Inhouse suport/knowledge)

Martin Holste

2:08 p.m.

Good question. It would theoretically work with Postgres with a fair amount of code modification, using COPY instead of LOAD DATA for the imports, and an ODBC connection for Sphinx. However, it would be a lot less time to install MySQL and gain a cursory knowledge of it than to modify all of the code the allow for Postgres. The other issue is that Postgres would be much slower in this case because the MySQL native connections in Sphinx are much faster than the ODBC connections (something like 50k rows/sec versus 6k rows/sec for indexing), and MyISAM tables are simpler than transaction-oriented tables, so they load data faster. Lastly, you'd lose the ability to archive logs, as it uses the MySQL ARCHIVE storage engine which provides an incredible compression ratio because it compresses an entire 32k chunk of table, not string by string. I would strongly urge you to try MySQL instead of modifying the code. It should install very quickly on almost any platform and there won't be much tuning or administration required. On Fri, Apr 1, 2011 at 2:11 AM, Hendrik Visage <hvjunk@gmail.com> wrote:

...

On Mon, Mar 28, 2011 at 10:26 PM, Martin Holste <mcholste@gmail.com> wrote:

...
I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG >= 3.1 and pattern-db at its core.

How easily could me make it DB independent? Specifically I have the need for PostgreSQL instead of MySQL (Inhouse suport/knowledge)

Bill Anderson

2:55 p.m.

Given that postgresql outperforms mysql in high writes and high concurrency situations and that more time is spent computing indexes rather that retrieving data from the db, sphinx on pgsql should be quite fast actually. Also on multi-core systems postgresql is significantly faster. The sphinx folks confirm that db communication isn't the bottleneck. Another Factor to consider is regulations or company policy regarding log storage. MyISAM is terrible when it cones to this, that's the tradeoff. For data integrity you'd be wiser to choose InnoDB or postgresql. Also o note is that postgresql can store the data compressed to begin with thus obviating the need for An archive. I've tested both directly an for my original needs neither was fast enough though postgresql was much faster than mysql in every case I tested. I would be wary of tossing out assumed numbers about performance. I've seen postgresql copy and even faster of_upload handles well over a gig of data per minute when the input file is much larger than system memory. Given this, I would say try it and see. On Apr 1, 2011, at 7:09, "Martin Holste" <mcholste@gmail.com> wrote:

...

Good question. It would theoretically work with Postgres with a fair amount of code modification, using COPY instead of LOAD DATA for the imports, and an ODBC connection for Sphinx. However, it would be a lot less time to install MySQL and gain a cursory knowledge of it than to modify all of the code the allow for Postgres. The other issue is that Postgres would be much slower in this case because the MySQL native connections in Sphinx are much faster than the ODBC connections (something like 50k rows/sec versus 6k rows/sec for indexing), and MyISAM tables are simpler than transaction-oriented tables, so they load data faster. Lastly, you'd lose the ability to archive logs, as it uses the MySQL ARCHIVE storage engine which provides an incredible compression ratio because it compresses an entire 32k chunk of table, not string by string.

I would strongly urge you to try MySQL instead of modifying the code. It should install very quickly on almost any platform and there won't be much tuning or administration required.

On Fri, Apr 1, 2011 at 2:11 AM, Hendrik Visage <hvjunk@gmail.com> wrote:

...
On Mon, Mar 28, 2011 at 10:26 PM, Martin Holste <mcholste@gmail.com> wrote:

...
I just put up an entry on my blog (http://ossectools.blogspot.com) describing the Enterprise Log Search and Archive Project (http://code.google.com/p/enterprise-log-search-and-archive) I've been working on which uses Syslog-NG >= 3.1 and pattern-db at its core.

How easily could me make it DB independent? Specifically I have the need for PostgreSQL instead of MySQL (Inhouse suport/knowledge)

______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html

Martin Holste

4:30 p.m.

This sounds like MySQL-vs-PgSQL flame war bait, but I do want to address some important ELSA architecture decisions. I will readily admit that I do not use PgSQL much at all, so I will not try to address those issues. I will also note that I almost decided to use straight flat-files instead of any RDBMS, but the XML pipe performance was slower for Sphinx, and an RDBMS does make a lot of other administrative things handier, so I decided to go with one. I don't think you're fully grasping just how Sphinx-dependent the whole setup is and how very, very little the DB has to do with performance since it is doing no indexing other than a simple primary key for the row. Sphinx is doing all queries as well except for the final row retrievals based on primary key (like I said, I could've used flat files). PgSQL will beat MySQL for many, many features, especially on writes (http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL), which you would think would make it a shoe-in for ELSA. But since all writes have to be in a batch anyway (as you said, neither MySQL nor PgSQL can handle the load with ordinary inserts), none of those extra write performance improvements help. The reason Facebook, Craigslist, and ELSA are using MySQL is because those features aren't needed. We need things like HandlerSocket because we're not doing transaction processing. If I were to switch to anything, it would be MongoDB because a schema-less implementation is helpful when you don't want a snowflake schema and you don't know all of your column names ahead of time. The only reason I didn't use it to begin with was the poor Sphinx XML pipe performance and the overhead of having to marshal everything through XML.

...

Given that postgresql outperforms mysql in high writes and high concurrency situations and that more time is spent computing indexes rather that retrieving data from the db, sphinx on pgsql should be quite fast actually.

The database has nothing to do with "computing indexes" as all time is spent "retrieving data from the db."

...

Also on multi-core systems postgresql is significantly faster. The sphinx folks confirm that db communication isn't the bottleneck.

I have used Sphinx a lot with ODBC on Sybase, and the numbers were very, very close to the numbers for using XML pipe with simple data, which is why I am assuming that ODBC was the bottleneck. If you have a reference regarding the bottleneck, please forward along. My recommendation is to use MySQL 5.5 which has been generally available for awhile now. There is something like %30 better performance all around according to most estimates on the MySQL Performance Blog, but major rewrites in the threading model make it scale to multiple CPU's much better. For those reading this thread and running MySQL, I highly recommend upgrading to 5.5.

...

Another Factor to consider is regulations or company policy regarding log storage. MyISAM is terrible when it cones to this, that's the tradeoff.

If you need encryption, do it at the file system or partition level, preferably through the RAID card or LVM. Regulations would only apply to encryption. In most orgs I've been around, few know PgSQL, so it's actually a much tougher sell than MySQL, which most people have at least heard of. I'm not saying that makes MySQL better, but I find it to be more accepted as a solution by upper-management. Now that Oracle is behind it (which I am personally not a fan of), it's even easier to for management to approve it.

...

For data integrity you'd be wiser to choose InnoDB or postgresql. Also o note is that postgresql can store the data compressed to begin with thus obviating the need for An archive.

Very true! Unfortunately, there are huge performance hits for both of these. InnoDB also takes up much, much more disk space than MyISAM (not sure about the PG disk space). ELSA is about performance and fast search. Logs you can't find might as well have been lost or never recorded to begin with.

...

I've tested both directly an for my original needs neither was fast enough though postgresql was much faster than mysql in every case I tested. If you weren't using MySQL 5.5, then I am sure this is true. Even if you were running 5.5, I'd still believe it's true. Also, MySQL does need a fair amount of ini file changes as the out-of-the-box setup is not optimized for large data sets. They include templates for various configurations in the RPM or tarball, and I recommend using the "/usr/share/mysql/my-huge.cnf" config as a template.

...

I would be wary of tossing out assumed numbers about performance. I've seen postgresql copy and even faster of_upload handles well over a gig of data per minute when the input file is much larger than system memory.

Assuming is bad! Do you have some numbers? My logs average about 300 bytes in message length, so 1 gig of data translates to roughly 60k rows/second. (1073741824 / 300 = 3579139.413 logs per minute, which is 59652.323 logs per second.) As I'm typing, here's what ELSA is currently logging for its rates for the to-be-indexed log table: Loaded 371906 records in 2.89521217346191 seconds (128455.525093796 per second) And here's the log for the archive table (compression is done as the inserts occur): Loaded 371906 records in 5.32420706748962 seconds (69851.903820742 per second) And here's the Sphinx indexing number next from the log: Indexed 371906 rows in 6.03515 seconds (61623.23033 rows/sec) So, that's about 3 seconds for the load and 6 seconds for the index for a total of 9 seconds for ELSA to full-text index (371906 * 300) = 111,571,800 bytes of log data. 9 seconds translated to a minute to compare with your number would be 6.66 * 111,571,800 = 743812000 bytes per minute. That's only 75% as fast as the PgSQL bulk load, but that means it's _full text indexed_ and still only %25 slower!

...

Given this, I would say try it and see.

As I said, the database is barely needed at all, so the only feature that matters at all is primary key lookup speed (for Sphinx index creation and query result finalizing) and bulk import speed. I would wager that with proper key buffers configured in MySQL, the difference for primary key lookup would be negligible, because it's almost entirely disk-bound, but it would be interesting to see if the Sphinx indexing numbers were significantly better on PgSQL, because 2x time is spent indexing than loading.

5280

Age (days ago)

5284

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Bill Anderson
Fekete Robert
Hendrik Visage
Lance Laursen
Martin Holste
Peter Czanik

Blog post on ELSA

Martin Holste

Peter Czanik

Fekete Robert

Martin Holste

Lance Laursen

Hendrik Visage

Martin Holste

Bill Anderson

Martin Holste

tags

participants (6)