Bill, I'm finding that the indexing process after LOAD DATA is still extremely intensive, though I'm getting about 12 million rows per hour inserted successfully, (albeit with an extremely high load on the box). If you do LOAD DATA into non-indexed tables, MySQL barely sweats and I think the upper bound on insert rates would be more like 50-100 million rows per hour. What does your schema look like? Are you splitting the messages into separate tables? I'm finding that the big challenge with getting tables into database third normal form is that you need to pre-generate your message indexes so that you can do a LOAD DATA. An auto-increment value on a column means you won't be able to split messages into more than one table and map them together easily, but if you only use the normal syslog header fields, you can get them all into one table without too much of a problem. However, I'm trying to get extracted field information inserted as well, which requires more than just one standard table and many more indexes. Thanks, Martin On Fri, Jul 24, 2009 at 12:20 PM, Bill Nash<billn@billn.net> wrote:
On Jul 22, 2009, at 3:17 PM, Clayton Dukes wrote:
Hi folks, I'm thinking about completely re-writing my app (php-syslog-ng) in order to provide high end scalability and a better (ajax) front end.
At the base of it, my biggest concern is scalability. I want to write the backend to be able to handle many thousands of messages per second.
In your opinion, what is the best way to accomplish this? Should I: Log everything to MySQL directly? If so, can MySQL handle that insert rate?
It can. I have perl based analyzers that eat a feed from syslog-ng and insert into mysql.
Log everything to disk and use flat files to do analysis and reporting? Log everything to disk and use MySQL's load_data_infile?
Other suggestions?
Also, is anyone out there using MySQL for this level of insert rates? Do you have any recommendations on table structures, indexes, etc.?
I'm at 16 million rows for today already, and it's only 10 am here.
Look into bulk insert formats. Keep in mind that mysql updates indexes after every insert completes, so row by row inserts will choke. Also, insert times increase slightly as the table grows, so you'll see a performance hit late in the day. You want to keep indexes as light as possible for this.
- billn
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html