Hello,

I am running syslog-ng (1.4.17) on Linux (2.4.21) and received an error

when the disk partition became full. Looking at the code, it does not look

as though the error was handled very well. After the error, syslog-ng

started to take up 26% of my CPU time and was unresponsive to other

logged messages to the file that was being written to when the error

occurred (even though the room was made in the partition).

The specifics are:

Received this error –

Jun 25 13:16:57 port1-1 syslog-ng[147]:

io.c: do_write: write() failed (errno 28),

No space left on device

So, do_write() correctly identifies the error and puts out a nice

message that the disk is full.

int res = write(self->fd, data, length);

if(res < 0)

{

switch(errno)

{

case EINTR:

case EWOULDBLOCK:

return 0;

default:

werror(

“io.c: do_write: write() failed (errno %i), %z\n”);

}

...

return(res);

}

The only place I can see where do_write is called is write_callback()

where I would have liked to have seen a “if(w < -1) do_something” but

the value of w is just sent in as an argument to BUF_FLUSH().

static void write_callback(struct nonblocking_fd *fd)

{

CAST(io_fd, self, fd);

int res;

struct fd_write w =

{{ STACK_HEADER, do_write), fd->fd, self->fsync };

assert(self->buffer);

res = BUF_FLUSH(self->buffer, &w.super);

So, if w is an error (-1), what is &w.super equal to? Something not

good.

Maybe I have this wrong. Maybe there is another place where

do_write() is called. Regardless, the value of w is not being checked

so what is the proper course of action if w is -1?

Note: my syslog-ng is running on an embedded system that does

not have much flash memory to store these logs. The system has

a process that truncates the log files when they exceed specific limits,

but the granularity of the file size checking can mean that one or more

processes can exceed their limit(s) and cause the disk to be full before

the checker process wakes up and does the cleanup, as happened in

this case.

Thanks.

John Feeney