Thanks for describing your setup a bit--I appreciate it. Do you do any graphing or reporting with your setup, or is mainly for grepping and alerting? I'm primarily interested in graphing, so indexing becomes a priority. I'm also interested in being able to make field-specific queries like "source port=80" and to be able to graph such results on the fly. On Friday, July 24, 2009, Bill Nash <billn@billn.net> wrote:
I can't release my schema, but I can talk about some of this. I sort into daily tables currently, so each day is separate from the previous. I log data like analyzer rule matches, source host, timestamps, with indexes on those fields only. It's pretty light, overall, with about 20 analyzers around the world inserting at once. I'm not sure what your setup is, but mine is for live logging as close to real time as I can get, given buffer fill cycles and the like.
An auto-increment value is only a problem if you really truly care about it, which, I don't. It's there for expiration purposes in my case, so I can delete by unique id within each day's table. Even if you start combining days into monthly tables, just give them a new unique id and index against it, because you're just using it as a convenient handle for reference from the application anyway.
One slight advantage I have by logging the analyzer rule id is that I don't need to store extracted information, I'd just reference the rule and re-extract it later. On-demand CPU usage vs storage/index costs. I think you'll find unless you have a pressing need to constantly be rooting through logs (I don't, since I have analyzers kicking off trouble tickets), you might want to reassess this.
- billn
On Jul 24, 2009, at 2:28 PM, Martin Holste wrote:
Bill,
I'm finding that the indexing process after LOAD DATA is still extremely intensive, though I'm getting about 12 million rows per hour inserted successfully, (albeit with an extremely high load on the box). If you do LOAD DATA into non-indexed tables, MySQL barely sweats and I think the upper bound on insert rates would be more like 50-100 million rows per hour. What does your schema look like? Are you splitting the messages into separate tables? I'm finding that the big challenge with getting tables into database third normal form is that you need to pre-generate your message indexes so that you can do a LOAD DATA. An auto-increment value on a column means you won't be able to split messages into more than one table and map them together easily, but if you only use the normal syslog header fields, you can get them all into one table without too much of a problem. However, I'm trying to get extracted field information inserted as well, which requires more than just one standard table and many more indexes.
Thanks,
Martin
On Fri, Jul 24, 2009 at 12:20 PM, Bill Nash<billn@billn.net> wrote:
On Jul 22, 2009, at 3:17 PM, Clayton Dukes wrote:
Hi folks, I'm thinking about completely re-writing my app (php-syslog-ng) in order to provide high end scalability and a better (ajax) front end.
At the base of it, my biggest concern is scalability. I want to write the backend to be able to handle many thousands of messages per second.
In your opinion, what is the best way to accomplish this? Should I: Log everything to MySQL directly? If so, can MySQL handle that insert rate?
It can. I have perl based analyzers that eat a feed from syslog-ng and insert into mysql.
Log everything to disk and use flat files to do analysis and reporting? Log everything to disk and use MySQL's load_data_infile?
Other suggestions?
Also, is anyone out there using MySQL for this level of insert rates? Do you have any recommendations on table structures, indexes, etc.?
I'm at 16 million rows for today already, and it's only 10 am here.
Look into bulk insert formats. Keep in mind that mysql updates indexes after every insert completes, so row by row inserts will choke. Also, insert times increase slightly as the table grows, so you'll see a performance hit late in the day. You want to keep indexes as light as possible for this.
- billn
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.campin.net/syslog-ng/faq.html
- billn
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng