[syslog-ng] Fastest, most scaleable solution

Martin Holste mcholste at gmail.com
Sat Jul 25 19:23:40 CEST 2009


Thanks for describing your setup a bit--I appreciate it.  Do you do
any graphing or reporting with your setup, or is mainly for grepping
and alerting?  I'm primarily interested in graphing, so indexing
becomes a priority.  I'm also interested in being able to make
field-specific queries like "source port=80" and to be able to graph
such results on the fly.

On Friday, July 24, 2009, Bill Nash <billn at billn.net> wrote:
>
> I can't release my schema, but I can talk about some of this. I sort
> into daily tables currently, so each day is separate from the
> previous. I log data like analyzer rule matches, source host,
> timestamps, with indexes on those fields only. It's pretty light,
> overall, with about 20 analyzers around the world inserting at once.
> I'm not sure what your setup is, but mine is for live logging as close
> to real time as I can get, given buffer fill cycles and the like.
>
> An auto-increment value is only a problem if you really truly care
> about it, which, I don't. It's there for expiration purposes in my
> case, so I can delete by unique id within each day's table. Even if
> you start combining days into monthly tables, just give them a new
> unique id and index against it, because you're just using it as a
> convenient handle for reference from the application anyway.
>
> One slight advantage I have by logging the analyzer rule id is that I
> don't need to store extracted information, I'd just reference the rule
> and re-extract it later. On-demand CPU usage vs storage/index  costs.
> I think you'll find unless you have a pressing need to constantly be
> rooting through logs (I don't, since I have analyzers kicking off
> trouble tickets), you might want to reassess this.
>
> - billn
>
> On Jul 24, 2009, at 2:28 PM, Martin Holste wrote:
>
>> Bill,
>>
>> I'm finding that the indexing process after LOAD DATA is still
>> extremely intensive, though I'm getting about 12 million rows per hour
>> inserted successfully, (albeit with an extremely high load on the
>> box).  If you do LOAD DATA into non-indexed tables, MySQL barely
>> sweats and I think the upper bound on insert rates would be more like
>> 50-100 million rows per hour.  What does your schema look like?  Are
>> you splitting the messages into separate tables?  I'm finding that the
>> big challenge with getting tables into database third normal form is
>> that you need to pre-generate your message indexes so that you can do
>> a LOAD DATA.  An auto-increment value on a column means you won't be
>> able to split messages into more than one table and map them together
>> easily, but if you only use the normal syslog header fields, you can
>> get them all into one table without too much of a problem.  However,
>> I'm trying to get extracted field information inserted as well, which
>> requires more than just one standard table and many more indexes.
>>
>> Thanks,
>>
>> Martin
>>
>> On Fri, Jul 24, 2009 at 12:20 PM, Bill Nash<billn at billn.net> wrote:
>>>
>>>
>>>
>>> On Jul 22, 2009, at 3:17 PM, Clayton Dukes wrote:
>>>
>>>> Hi folks,
>>>> I'm thinking about completely re-writing my app (php-syslog-ng) in
>>>> order to provide high end scalability and a better (ajax) front end.
>>>>
>>>> At the base of it, my biggest concern is scalability.
>>>> I want to write the backend to be able to handle many thousands of
>>>> messages per second.
>>>>
>>>> In your opinion, what is the best way to accomplish this?
>>>> Should I:
>>>> Log everything to MySQL directly? If so, can MySQL handle that
>>>> insert rate?
>>>
>>> It can. I have perl based analyzers that eat a feed from syslog-ng
>>> and
>>> insert into mysql.
>>>
>>>> Log everything to disk and use flat files to do analysis and
>>>> reporting?
>>>> Log everything to disk and use MySQL's load_data_infile?
>>>>
>>>> Other suggestions?
>>>>
>>>> Also, is anyone out there using MySQL for this level of insert
>>>> rates?
>>>> Do you have any recommendations on table structures, indexes, etc.?
>>>>
>>>
>>> I'm at 16 million rows for today already, and it's only 10 am here.
>>>
>>> Look into bulk insert formats. Keep in mind that mysql updates
>>> indexes
>>> after every insert completes, so row by row inserts will choke. Also,
>>> insert times increase slightly as the table grows, so you'll see a
>>> performance hit late in the day. You want to keep indexes as light as
>>> possible for this.
>>>
>>> - billn
>>>
>>>
>>>
>>> ______________________________________________________________________________
>>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>>
>>>
>> ______________________________________________________________________________
>> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
>> FAQ: http://www.campin.net/syslog-ng/faq.html
>>
>
> - billn
>
>
>
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
>


More information about the syslog-ng mailing list