[syslog-ng] syslog-ng + database performance

Alexander Clouter ac56 at soas.ac.uk
Sun May 14 23:08:56 CEST 2006


Hi,

disclaimer: this advice is all comes from the 'abyss' that is my brain and 
	never has gone into practice, but it does come from issues I have had 
	to deal with myself in the past

Dev Anand <deva.security at gmail.com> [20060514 17:19:27 +0530]:
>
> Just the same as Arya, we too have a performance issue in this syslog-ng .
> 
A lot of problesm pile up as people seem to love SQL a little too much, 
however as an alternative to flat text files it actually is pretty neat 
thing.

> Particularly we got only 20 servers which are giving in the logs to a
> server and the queries are taking more time to display the results.
> 
Thats probably the result of 'sub-optimal' SQL queries, table layout and 
indexing.  SQL can never be counted on for realtime, or even 'semi' realtime,
applications regardless of what people say.  Databases should always be 
thought of as a slow harddisk and that there is always going to be a 
bottleneck due to it.

The solution is to devise to a method that removes SQL latency from the 
equation.  This might seem difficult but in practice is rather straight 
forward, there are two methods, for the OP I would recommend the latter.

> Is there any other way of optimising the dbs or any special tweakings
> that can be done to the mysql server as such .
> 
Writing to the DB:
------------------
Have a second buffer which chains up the SQL queries to create a single 
multi-rule-drop INSERT.  You probably have seen the difference in speed when 
you mysqldump the database into single INSERT's and then into multi-INSERT's 
when it comes to restoration.  I would say its at least ten times faster, if 
not more.  This will reduce the SQL latency issue but it will not remove it.

The second approach removes SQL latency altogether.  Data is dumped to local 
log files with something like a five to sixty minute log rotation.  A cron 
job runs that picks up these files and then pushes them into the database.  
This part can be placed at slightly lower priority (if on the same 
machine as well as the SQL server), nice 19 or something, so that the rest of 
the machine is always available to receive the events.  Use a maildir 
approach to processing the logs where you have:

logs---+-recorded
       +-processing
       `-finished

Log files start in 'recorded', your script picks the oldest (if there is only 
one log file it skips this turn) one in 'recorded' and moves it into 
'processing'.  Then it works on the file.  When finished it drops it in 
'finished'.  The cron script runs once a minute.  If there is a file in 
'processing' it does nothing, if the file in processing is older than 'x 
minutes' then an alert could be sent to the administrator.  This will detect 
when updates are talking too long or a script has stalled; this points to a 
problem that has occured'

Its important that only one script rus at a time as the DB is going to be in 
a locked state anyway pushing the recorded logs to it, running a second 
instance is only going to slow down the thing further or not go anywhere 
until the first script has unlocked the database.

Now SQL latency is no longer an issue.

Reading from the DB:
--------------------
Its worth checking your SQL statements to make sure they do not clobber the 
entire database pointlessly and do not soak up huge amounts of RAM in the 
process of execution.  Indexing can help, but do not get carried away.  Only 
index things like the date or the machine the data came from, I doubt there 
is much else thats useful to index.

Whilst INSERTing the rules you might want to consider some pre-processing.  
If the log entry comes from a mailserver and its the SMTP daemon, flag that 
entry in an ENUM column as being part of a SMTP daemon.  You are then 
effectly creating an index based on the data contained in the log messages, 
this column you can then index on.

You probably will find that if you are executing your queries on the live 
recording server thats going to make things abismal.  This is bad practice, 
especially on a heavily used logging server as everytime a query is needed to 
be made it has to probably wait for a stack of INSERT's to complete, then 
when you do execute your query you overflow the buffer on syslog-ng (if you 
are not using my second writing method) and loose a lot of data.

You probably should consider replicating the data off at intervals to a 
second 'read-only' mysql setup.  I would consider 'mysqldump' as its damn 
fast but you probably will find you might have to introduce even table 
rotation to make that effective.

Of course if you go with my second DB writing method you can make the queries 
directly on the DB without loosing any data.

> Thanks in advance for any kind suggestions
>
I just hope my advice does not cause your pants to explode....thats my final 
disclaimer :)

Cheers

Alex

> Deva
> 
> On 5/14/06, Arya, Manish Kumar <m.arya at yahoo.com> wrote:
> >Hi Guys,
> >
> >   Thanks for your valuable suggestions for syslog-ng
> >UI.
> >
> >   I have seen that most of the UIs are avialable
> >with databases.
> >   I have syslog-ng+oracle setup too. but I am not
> >happy with performance.
> >   we have a central log server with 3000G SAN and 15
> >GB RAM. and 20,000 devices are suppose to pump logs
> >24x7 :)
> >   with oracle we faced two serious issues, thats why
> >i also started pumping logs in files along with db.
> >
> >-inserts, i have using named pipe to insert logs in
> >db, but oracle somehow drops inserts, becuase "rate of
> >arival of events" is much larger than "rate of insert
> >operations". I have noticed that there is about 80-90%
> >event drops in db.
> >
> >-select, when we search logs, it was really really bad
> >performance it took too long to give results. but then
> >we did indexing on hostname and partitioned table on
> >time (new range partition is created after every 6
> >hrs)
> >This improved system performance to some extent.
> >
> >can you guys suggest me if mysql or postgre will be
> >better to overcome above to problems (but remember our
> >db is huge :), so I am not sure if mysql or postgre is
> >able to handle such big db)
> >
> >Regards,
> >-Manish
> >
> >--- Jon Stearley <jrstear at sandia.gov> wrote:
> >
> >>
> >> On May 11, 2006, at 12:09 PM, Ken Garland wrote:
> >>
> >> >>
> >> file("/logs/log01/indexlog/$YEAR/$MONTH/$DAY/$HOST"
> >> >>   ...
> >> >> -should be able to to parallel search to improve
> >> >> search response time.
> >>
> >> If you decide to go with SQL and have $$,
> >> netezza.com will almost
> >> certainly overcome your speed issues (parallel
> >> harware sql!).  Having
> >> gotten utterly bogged down with Mysql on Linux
> >> (stripes, chunks, huge
> >> indexes), I just went back to files because they are
> >> simple and
> >> sufficient for my purposes.
> >>
> >> > if you are splitting all logs up into subdirs like
> >> that you will
> >> > have quite a fun time doing any parsing.
> >>
> >> If dirs/logs are arranged according to the factors
> >> used for subset
> >> selection (year/month/day/host) and the dirs/logs
> >> are listed in a
> >> (periodically updated) file (eg "corpus.docs" in
> >> sisyphus), subset
> >> selection can be done by simply grepping the file
> >> and concatenating
> >> the resulting dirs/logs.  This is one implementation
> >> option
> >> underlying the clog.man page I sent earlier.
> >> Further subset
> >> selection by facility and priority could then be
> >> done by grepping the
> >> resulting log content (further dirs/logs splitting
> >> by facility/
> >> priority presents multiple bad side effects).  $0.02
> >>
> >> -jon
> >>
> >>
> >> _______________________________________________
> >> syslog-ng maillist  -  syslog-ng at lists.balabit.hu
> >> https://lists.balabit.hu/mailman/listinfo/syslog-ng
> >> Frequently asked questions at
> >> http://www.campin.net/syslog-ng/faq.html
> >>
> >>
> >
> >
> >__________________________________________________
> >Do You Yahoo!?
> >Tired of spam?  Yahoo! Mail has the best spam protection around
> >http://mail.yahoo.com
> >_______________________________________________
> >syslog-ng maillist  -  syslog-ng at lists.balabit.hu
> >https://lists.balabit.hu/mailman/listinfo/syslog-ng
> >Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
> >
> >
> _______________________________________________
> syslog-ng maillist  -  syslog-ng at lists.balabit.hu
> https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
> 
> 


More information about the syslog-ng mailing list