[syslog-ng] Fastest, most scaleable solution

Bill Nash billn at billn.net
Sat Jul 25 00:07:19 CEST 2009


I can't release my schema, but I can talk about some of this. I sort  
into daily tables currently, so each day is separate from the  
previous. I log data like analyzer rule matches, source host,  
timestamps, with indexes on those fields only. It's pretty light,  
overall, with about 20 analyzers around the world inserting at once.  
I'm not sure what your setup is, but mine is for live logging as close  
to real time as I can get, given buffer fill cycles and the like.

An auto-increment value is only a problem if you really truly care  
about it, which, I don't. It's there for expiration purposes in my  
case, so I can delete by unique id within each day's table. Even if  
you start combining days into monthly tables, just give them a new  
unique id and index against it, because you're just using it as a  
convenient handle for reference from the application anyway.

One slight advantage I have by logging the analyzer rule id is that I  
don't need to store extracted information, I'd just reference the rule  
and re-extract it later. On-demand CPU usage vs storage/index  costs.  
I think you'll find unless you have a pressing need to constantly be  
rooting through logs (I don't, since I have analyzers kicking off  
trouble tickets), you might want to reassess this.

- billn

On Jul 24, 2009, at 2:28 PM, Martin Holste wrote:

> Bill,
>
> I'm finding that the indexing process after LOAD DATA is still
> extremely intensive, though I'm getting about 12 million rows per hour
> inserted successfully, (albeit with an extremely high load on the
> box).  If you do LOAD DATA into non-indexed tables, MySQL barely
> sweats and I think the upper bound on insert rates would be more like
> 50-100 million rows per hour.  What does your schema look like?  Are
> you splitting the messages into separate tables?  I'm finding that the
> big challenge with getting tables into database third normal form is
> that you need to pre-generate your message indexes so that you can do
> a LOAD DATA.  An auto-increment value on a column means you won't be
> able to split messages into more than one table and map them together
> easily, but if you only use the normal syslog header fields, you can
> get them all into one table without too much of a problem.  However,
> I'm trying to get extracted field information inserted as well, which
> requires more than just one standard table and many more indexes.
>
> Thanks,
>
> Martin
>
> On Fri, Jul 24, 2009 at 12:20 PM, Bill Nash<billn at billn.net> wrote:
>>
>>
>>
>> On Jul 22, 2009, at 3:17 PM, Clayton Dukes wrote:
>>
>>> Hi folks,
>>> I'm thinking about completely re-writing my app (php-syslog-ng) in
>>> order to provide high end scalability and a better (ajax) front end.
>>>
>>> At the base of it, my biggest concern is scalability.
>>> I want to write the backend to be able to handle many thousands of
>>> messages per second.
>>>
>>> In your opinion, what is the best way to accomplish this?
>>> Should I:
>>> Log everything to MySQL directly? If so, can MySQL handle that
>>> insert rate?
>>
>> It can. I have perl based analyzers that eat a feed from syslog-ng  
>> and
>> insert into mysql.
>>
>>> Log everything to disk and use flat files to do analysis and
>>> reporting?
>>> Log everything to disk and use MySQL's load_data_infile?
>>>
>>> Other suggestions?
>>>
>>> Also, is anyone out there using MySQL for this level of insert  
>>> rates?
>>> Do you have any recommendations on table structures, indexes, etc.?
>>>
>>
>> I'm at 16 million rows for today already, and it's only 10 am here.
>>
>> Look into bulk insert formats. Keep in mind that mysql updates  
>> indexes
>> after every insert completes, so row by row inserts will choke. Also,
>> insert times increase slightly as the table grows, so you'll see a
>> performance hit late in the day. You want to keep indexes as light as
>> possible for this.
>>
>> - billn
>>
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>

- billn





More information about the syslog-ng mailing list