[syslog-ng] Fail safe test for syslog-ng

dstuxo dstuxo at gmail.com
Wed Jan 16 15:18:09 CET 2008


Thank you for your suggestions.

However, my mistake... I forgot to tell you that it is about
an embedded board, with 4GB compact flash card for storage
(CF2IDE adapter).
This a development platform and no changes can be made to it
at this time.

Anyways, there is still one missing part...
After few more tests:
I changed block I/O scheduler from CFQ to NOOP (more suitable in this case).
Result:
Why sysklog can handle the same messages very well and syslog-ng not?
The same delay when starting to log.
This gives me the clue that the problem is not with flushing buffers.

I am using syslog-ng-2.0.3 (subjective choice)

There is any major improvement (in term of speed) between this version and
the last one?

,Regards

Evan Rempel wrote:
> Valdis.Kletnieks at vt.edu wrote:
>   
>> On Tue, 15 Jan 2008 13:50:20 +0200, dstuxo said:
>>
>>
>>     
>>> I am using syslog-ng in a system which is supposed to be fail safe to avoid
>>> data loss. However I encountered an unexpected delay and messages are lost.
>>>       
>> This turns out to be a *lot* harder to achieve than you would expect...
>>
>>
>>     
>>> I know about the fsync(yes) option and I tried it, but with high load of
>>> messages It doesn't work proper (syslog-ng.config becomes corrupted and
>>> messages are also lost)
>>>       
>> On any given hardware, there is an *absolute* upper limit to how many fsync()
>> operations it can support per second - and it's often quite low, on the order
>> of 50-75 per second.  This is mainly due to the fact that an fsync will almost
>> always cause a disk seek, which can take up to 10 milliseconds (restricting you
>> to 100 per second).  There's only a few ways around this:
>>
>> 0) Many/most disk drives have a rather small (8M to 64M) cache in front of them
>> - if you care about performance, you enable it, if you care about "fail safe",
>> you turn it off, because if you do a "wait 2 seconds then I power down",
>> anything that had made it to the disk's cache but wasn't written out *will go
>> away*.  If you need to run it with cache on, you need to make sure that (a) you
>> have a UPS so it never loses power before you can (b) make sure that you
>> properly shut down the disks, including a 'flush cache" and waiting for it to
>> complete, before pulling the power...
>>
>> 1) Tell syslog-ng to only fsync every <insert time unit here>, and be prepared
>> to lose the last few second's worth if there's a crash.
>>
>> 2) Be prepared to spend US$400,000 and up for a high-end EMC or similar disk
>> array that can handle insanely high I/O loads.  Part of this cost is the
>> battery backup for the disks, so that even if you "wait 2 seconds and power
>> down", the disks don't actually spin down.
>>
>> 3) Learn to accept that unless you spend lots of money, there's lots of ways
>> that you can lose messages.
>>     
>
> There are a lot cheaper solutions to this problem.
> You can purchase a raid controller card with its own battery backed cache, often
> refferred to as a fast write cache. This cache will immediately respond to a fsync
> since the data is "permenant" with regards to the storage system capturing it. In reality
> it may only live for 1 week in the event of a power loss.
>
> You can use standard disk behind the conroller, but you should ensure that the disk
> drives themselves do NOT have the write cache enabled. If it is, then a power loss
> will loose the data in the disk cache.
>
> When the disk drives spin up again, the caching raid card will committ the writes to
> the disk from its battery backed cache.
>
> These cards are in the order of $1-2,000.
>
> This means that you can enable the fsync feature of syslog-ng without the huge
> performance penalty. The caching raid card will coellesce the individual fsync writes
> into a large write to the actual media.
>
> We have seen write speeds twice as fast as read speeds even with fsync applications.
>
> You will still have some upper limit which is often around 100,000 IO/sec, so that
> means that you have a limit of 100,000 syslog messages per second.
>
> As an example, see the LSI MegaRAID SAS 8480E controller which has 256MB of battery backed cache
> that lasts 72 hours without power and supports up to 140,000 I/O per second. You don't have to
> boot the OS to finish the writes, just get power to the box/drives for 30 seconds and you have
> it committed to disk. A small portable UPS can do this.
>
> I am in no way affiliated with LSI, and have never use the product mentioned.
>
>
> Evan Rempel.
> _______________________________________________
> syslog-ng maillist  -  syslog-ng at lists.balabit.hu
> https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
>
>
>   



More information about the syslog-ng mailing list