Hi, disclaimer: this advice is all comes from the 'abyss' that is my brain and never has gone into practice, but it does come from issues I have had to deal with myself in the past Dev Anand <deva.security@gmail.com> [20060514 17:19:27 +0530]:
Just the same as Arya, we too have a performance issue in this syslog-ng .
A lot of problesm pile up as people seem to love SQL a little too much, however as an alternative to flat text files it actually is pretty neat thing.
Particularly we got only 20 servers which are giving in the logs to a server and the queries are taking more time to display the results.
Thats probably the result of 'sub-optimal' SQL queries, table layout and indexing. SQL can never be counted on for realtime, or even 'semi' realtime, applications regardless of what people say. Databases should always be thought of as a slow harddisk and that there is always going to be a bottleneck due to it. The solution is to devise to a method that removes SQL latency from the equation. This might seem difficult but in practice is rather straight forward, there are two methods, for the OP I would recommend the latter.
Is there any other way of optimising the dbs or any special tweakings that can be done to the mysql server as such .
Writing to the DB: ------------------ Have a second buffer which chains up the SQL queries to create a single multi-rule-drop INSERT. You probably have seen the difference in speed when you mysqldump the database into single INSERT's and then into multi-INSERT's when it comes to restoration. I would say its at least ten times faster, if not more. This will reduce the SQL latency issue but it will not remove it. The second approach removes SQL latency altogether. Data is dumped to local log files with something like a five to sixty minute log rotation. A cron job runs that picks up these files and then pushes them into the database. This part can be placed at slightly lower priority (if on the same machine as well as the SQL server), nice 19 or something, so that the rest of the machine is always available to receive the events. Use a maildir approach to processing the logs where you have: logs---+-recorded +-processing `-finished Log files start in 'recorded', your script picks the oldest (if there is only one log file it skips this turn) one in 'recorded' and moves it into 'processing'. Then it works on the file. When finished it drops it in 'finished'. The cron script runs once a minute. If there is a file in 'processing' it does nothing, if the file in processing is older than 'x minutes' then an alert could be sent to the administrator. This will detect when updates are talking too long or a script has stalled; this points to a problem that has occured' Its important that only one script rus at a time as the DB is going to be in a locked state anyway pushing the recorded logs to it, running a second instance is only going to slow down the thing further or not go anywhere until the first script has unlocked the database. Now SQL latency is no longer an issue. Reading from the DB: -------------------- Its worth checking your SQL statements to make sure they do not clobber the entire database pointlessly and do not soak up huge amounts of RAM in the process of execution. Indexing can help, but do not get carried away. Only index things like the date or the machine the data came from, I doubt there is much else thats useful to index. Whilst INSERTing the rules you might want to consider some pre-processing. If the log entry comes from a mailserver and its the SMTP daemon, flag that entry in an ENUM column as being part of a SMTP daemon. You are then effectly creating an index based on the data contained in the log messages, this column you can then index on. You probably will find that if you are executing your queries on the live recording server thats going to make things abismal. This is bad practice, especially on a heavily used logging server as everytime a query is needed to be made it has to probably wait for a stack of INSERT's to complete, then when you do execute your query you overflow the buffer on syslog-ng (if you are not using my second writing method) and loose a lot of data. You probably should consider replicating the data off at intervals to a second 'read-only' mysql setup. I would consider 'mysqldump' as its damn fast but you probably will find you might have to introduce even table rotation to make that effective. Of course if you go with my second DB writing method you can make the queries directly on the DB without loosing any data.
Thanks in advance for any kind suggestions
I just hope my advice does not cause your pants to explode....thats my final disclaimer :) Cheers Alex
Deva
On 5/14/06, Arya, Manish Kumar <m.arya@yahoo.com> wrote:
Hi Guys,
Thanks for your valuable suggestions for syslog-ng UI.
I have seen that most of the UIs are avialable with databases. I have syslog-ng+oracle setup too. but I am not happy with performance. we have a central log server with 3000G SAN and 15 GB RAM. and 20,000 devices are suppose to pump logs 24x7 :) with oracle we faced two serious issues, thats why i also started pumping logs in files along with db.
-inserts, i have using named pipe to insert logs in db, but oracle somehow drops inserts, becuase "rate of arival of events" is much larger than "rate of insert operations". I have noticed that there is about 80-90% event drops in db.
-select, when we search logs, it was really really bad performance it took too long to give results. but then we did indexing on hostname and partitioned table on time (new range partition is created after every 6 hrs) This improved system performance to some extent.
can you guys suggest me if mysql or postgre will be better to overcome above to problems (but remember our db is huge :), so I am not sure if mysql or postgre is able to handle such big db)
Regards, -Manish
--- Jon Stearley <jrstear@sandia.gov> wrote:
On May 11, 2006, at 12:09 PM, Ken Garland wrote:
file("/logs/log01/indexlog/$YEAR/$MONTH/$DAY/$HOST"
... -should be able to to parallel search to improve search response time.
If you decide to go with SQL and have $$, netezza.com will almost certainly overcome your speed issues (parallel harware sql!). Having gotten utterly bogged down with Mysql on Linux (stripes, chunks, huge indexes), I just went back to files because they are simple and sufficient for my purposes.
if you are splitting all logs up into subdirs like that you will have quite a fun time doing any parsing.
If dirs/logs are arranged according to the factors used for subset selection (year/month/day/host) and the dirs/logs are listed in a (periodically updated) file (eg "corpus.docs" in sisyphus), subset selection can be done by simply grepping the file and concatenating the resulting dirs/logs. This is one implementation option underlying the clog.man page I sent earlier. Further subset selection by facility and priority could then be done by grepping the resulting log content (further dirs/logs splitting by facility/ priority presents multiple bad side effects). $0.02
-jon
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html