[syslog-ng] Fastest, most scaleable solution

Sat Jul 25 19:29:05 CEST 2009

I'm predominantly into alerting. I don't graph my traffic, but I do  
keep an eye on inbound rates, because occasionally someone puts a box  
or app into debug mode and runs line rate messages at an analyzer.  
Some of the bodies have never been found.

- billn

On Jul 25, 2009, at 10:23 AM, Martin Holste wrote:

> Thanks for describing your setup a bit--I appreciate it.  Do you do
> any graphing or reporting with your setup, or is mainly for grepping
> and alerting?  I'm primarily interested in graphing, so indexing
> becomes a priority.  I'm also interested in being able to make
> field-specific queries like "source port=80" and to be able to graph
> such results on the fly.
>
> On Friday, July 24, 2009, Bill Nash <billn at billn.net> wrote:
>>
>> I can't release my schema, but I can talk about some of this. I sort
>> into daily tables currently, so each day is separate from the
>> previous. I log data like analyzer rule matches, source host,
>> timestamps, with indexes on those fields only. It's pretty light,
>> overall, with about 20 analyzers around the world inserting at once.
>> I'm not sure what your setup is, but mine is for live logging as  
>> close
>> to real time as I can get, given buffer fill cycles and the like.
>>
>> An auto-increment value is only a problem if you really truly care
>> about it, which, I don't. It's there for expiration purposes in my
>> case, so I can delete by unique id within each day's table. Even if
>> you start combining days into monthly tables, just give them a new
>> unique id and index against it, because you're just using it as a
>> convenient handle for reference from the application anyway.
>>
>> One slight advantage I have by logging the analyzer rule id is that I
>> don't need to store extracted information, I'd just reference the  
>> rule
>> and re-extract it later. On-demand CPU usage vs storage/index  costs.
>> I think you'll find unless you have a pressing need to constantly be
>> rooting through logs (I don't, since I have analyzers kicking off
>> trouble tickets), you might want to reassess this.
>>
>> - billn
>>
>> On Jul 24, 2009, at 2:28 PM, Martin Holste wrote:
>>
>>> Bill,
>>>
>>> I'm finding that the indexing process after LOAD DATA is still
>>> extremely intensive, though I'm getting about 12 million rows per  
>>> hour
>>> inserted successfully, (albeit with an extremely high load on the
>>> box).  If you do LOAD DATA into non-indexed tables, MySQL barely
>>> sweats and I think the upper bound on insert rates would be more  
>>> like
>>> 50-100 million rows per hour.  What does your schema look like?  Are
>>> you splitting the messages into separate tables?  I'm finding that  
>>> the
>>> big challenge with getting tables into database third normal form is
>>> that you need to pre-generate your message indexes so that you can  
>>> do
>>> a LOAD DATA.  An auto-increment value on a column means you won't be
>>> able to split messages into more than one table and map them  
>>> together
>>> easily, but if you only use the normal syslog header fields, you can
>>> get them all into one table without too much of a problem.  However,
>>> I'm trying to get extracted field information inserted as well,  
>>> which
>>> requires more than just one standard table and many more indexes.
>>>
>>> Thanks,
>>>
>>> Martin
>>>
>>> On Fri, Jul 24, 2009 at 12:20 PM, Bill Nash<billn at billn.net> wrote:
>>>>
>>>>
>>>>
>>>> On Jul 22, 2009, at 3:17 PM, Clayton Dukes wrote:
>>>>
>>>>> Hi folks,
>>>>> I'm thinking about completely re-writing my app (php-syslog-ng) in
>>>>> order to provide high end scalability and a better (ajax) front  
>>>>> end.
>>>>>
>>>>> At the base of it, my biggest concern is scalability.
>>>>> I want to write the backend to be able to handle many thousands of
>>>>> messages per second.
>>>>>
>>>>> In your opinion, what is the best way to accomplish this?
>>>>> Should I:
>>>>> Log everything to MySQL directly? If so, can MySQL handle that
>>>>> insert rate?
>>>>
>>>> It can. I have perl based analyzers that eat a feed from syslog-ng
>>>> and
>>>> insert into mysql.
>>>>
>>>>> Log everything to disk and use flat files to do analysis and
>>>>> reporting?
>>>>> Log everything to disk and use MySQL's load_data_infile?
>>>>>
>>>>> Other suggestions?
>>>>>
>>>>> Also, is anyone out there using MySQL for this level of insert
>>>>> rates?
>>>>> Do you have any recommendations on table structures, indexes,  
>>>>> etc.?
>>>>>
>>>>
>>>> I'm at 16 million rows for today already, and it's only 10 am here.
>>>>
>>>> Look into bulk insert formats. Keep in mind that mysql updates
>>>> indexes
>>>> after every insert completes, so row by row inserts will choke.  
>>>> Also,
>>>> insert times increase slightly as the table grows, so you'll see a
>>>> performance hit late in the day. You want to keep indexes as  
>>>> light as
>>>> possible for this.
>>>>
>>>> - billn
>>>>
>>>>
>>>>
>>>> ______________________________________________________________________________
>>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>>>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>>>
>>>>
>>> ______________________________________________________________________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>>
>>
>> - billn
>>
>>
>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.campin.net/syslog-ng/faq.html
>

- billn