On May 11, 2006, at 12:09 PM, Ken Garland wrote:
file("/logs/log01/indexlog/$YEAR/$MONTH/$DAY/$HOST" ... -should be able to to parallel search to improve search response time.
If you decide to go with SQL and have $$, netezza.com will almost certainly overcome your speed issues (parallel harware sql!). Having gotten utterly bogged down with Mysql on Linux (stripes, chunks, huge indexes), I just went back to files because they are simple and sufficient for my purposes.
if you are splitting all logs up into subdirs like that you will have quite a fun time doing any parsing.
If dirs/logs are arranged according to the factors used for subset selection (year/month/day/host) and the dirs/logs are listed in a (periodically updated) file (eg "corpus.docs" in sisyphus), subset selection can be done by simply grepping the file and concatenating the resulting dirs/logs. This is one implementation option underlying the clog.man page I sent earlier. Further subset selection by facility and priority could then be done by grepping the resulting log content (further dirs/logs splitting by facility/ priority presents multiple bad side effects). $0.02 -jon