[PATCH 0/N]: Various fixes from the syslog-ng team
Hi! A few selected patches from the syslog-ng team at BalaBit will follow this e-mail. This here mostly serves as an umberalla mail, under which I can shovel the rest, so they end up like a neat little thread. The patches I am about to send are available on the team/4.1-sprint5/fixes branch of my git tree. Special thanks to Viktor Juhasz <jviktor@balabit.hu> of the syslog-ng team for doing the bulk of the work, and assembling the list of patches that can be safely sent to the list. Another batch will be hitting the list a bit later, but unlike this first, which I expect to be safely appliable, the second batch will need a more through review. I'll comment on the whys when those patches are posted. I expect more patches from the team to follow, once the features and fixes are in a good enough shape. (In the meantime, I'll try to browse through their tree to see if any other bugfix can be isolated) -- |8]
From: Peter Gyorko <gyorkop@balabit.hu> If no encoding is specified, we read directly into our buffer, therefore need to verify the buffer sizes. If we do use encoding, the raw buffer size cannot be directly compared to our own buffer size. Therefore, only validate the buffer size if no encoding is specified, because otherwise it doesn't make sense. Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> Signed-off-by: Peter Gyorko <gyorkop@balabit.hu> --- lib/logproto.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/lib/logproto.c b/lib/logproto.c index d526357..6e81892 100644 --- a/lib/logproto.c +++ b/lib/logproto.c @@ -651,16 +651,16 @@ log_proto_buffered_server_apply_state(LogProtoBufferedServer *self, PersistEntry gssize rc; guchar *raw_buffer; - if (state->raw_buffer_size > state->buffer_size) - { - msg_notice("Invalid raw_buffer_size member in LogProtoBufferedServer state, restarting from the beginning", - evt_tag_str("state", persist_name), - NULL); - goto error; - } if (!self->super.encoding) { /* no conversion, we read directly into our buffer */ + if (state->raw_buffer_size > state->buffer_size) + { + msg_notice("Invalid raw_buffer_size member in LogProtoBufferedServer state, restarting from the beginning", + evt_tag_str("state", persist_name), + NULL); + goto error; + } raw_buffer = self->buffer; } else -- 1.7.2.5
Hi, Can you please explain when this causes problems and what the problem is? Does it perhaps need to be backported to earlier versions? Also, some sanity check on raw_buffer_size would be needed, as we try to allocate that size from the stack, which is limited in size. On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Peter Gyorko <gyorkop@balabit.hu>
If no encoding is specified, we read directly into our buffer, therefore need to verify the buffer sizes. If we do use encoding, the raw buffer size cannot be directly compared to our own buffer size.
Therefore, only validate the buffer size if no encoding is specified, because otherwise it doesn't make sense.
Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> Signed-off-by: Peter Gyorko <gyorkop@balabit.hu> --- lib/logproto.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/lib/logproto.c b/lib/logproto.c index d526357..6e81892 100644 --- a/lib/logproto.c +++ b/lib/logproto.c @@ -651,16 +651,16 @@ log_proto_buffered_server_apply_state(LogProtoBufferedServer *self, PersistEntry gssize rc; guchar *raw_buffer;
- if (state->raw_buffer_size > state->buffer_size) - { - msg_notice("Invalid raw_buffer_size member in LogProtoBufferedServer state, restarting from the beginning", - evt_tag_str("state", persist_name), - NULL); - goto error; - } if (!self->super.encoding) { /* no conversion, we read directly into our buffer */ + if (state->raw_buffer_size > state->buffer_size) + { + msg_notice("Invalid raw_buffer_size member in LogProtoBufferedServer state, restarting from the beginning", + evt_tag_str("state", persist_name), + NULL); + goto error; + } raw_buffer = self->buffer; } else
-- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
Hi,
Can you please explain when this causes problems and what the problem is? Does it perhaps need to be backported to earlier versions?
As far as I remember, this caused problems when encoding() was specified for a source (a file source, in this case). In particular, a source file with UCS4 encoding triggered a crash. As I understand it, if encoding is in place, then the size of self->buffer does not matter all that much, since we'll be using a different buffer. However, due to encoding issues, the raw buffer size might easily be larger than what we have allocated for self->buffer - but that's no big deal in this case, because we'll allocate a large enough buffer in the else branch. (Note: this is my understanding of the code, this area is a bit hazy for me at the moment) -- |8]
On Sat, 2011-05-28 at 14:30 +0200, Gergely Nagy wrote:
Balazs Scheidler <bazsi@balabit.hu> writes:
Hi,
Can you please explain when this causes problems and what the problem is? Does it perhaps need to be backported to earlier versions?
As far as I remember, this caused problems when encoding() was specified for a source (a file source, in this case). In particular, a source file with UCS4 encoding triggered a crash.
As I understand it, if encoding is in place, then the size of self->buffer does not matter all that much, since we'll be using a different buffer. However, due to encoding issues, the raw buffer size might easily be larger than what we have allocated for self->buffer - but that's no big deal in this case, because we'll allocate a large enough buffer in the else branch.
(Note: this is my understanding of the code, this area is a bit hazy for me at the moment)
I don't see how this fixes the crash, but I'll see to it if I can reproduce the original problem. -- Bazsi
On Sat, 2011-05-28 at 20:48 +0200, Balazs Scheidler wrote:
On Sat, 2011-05-28 at 14:30 +0200, Gergely Nagy wrote:
Balazs Scheidler <bazsi@balabit.hu> writes:
Hi,
Can you please explain when this causes problems and what the problem is? Does it perhaps need to be backported to earlier versions?
As far as I remember, this caused problems when encoding() was specified for a source (a file source, in this case). In particular, a source file with UCS4 encoding triggered a crash.
As I understand it, if encoding is in place, then the size of self->buffer does not matter all that much, since we'll be using a different buffer. However, due to encoding issues, the raw buffer size might easily be larger than what we have allocated for self->buffer - but that's no big deal in this case, because we'll allocate a large enough buffer in the else branch.
(Note: this is my understanding of the code, this area is a bit hazy for me at the moment)
I don't see how this fixes the crash, but I'll see to it if I can reproduce the original problem.
I've successfully reproduced the problem. The root cause was that the error handling path didn't reset some of the state into proper values, and the crash could happen if any of the error paths were taken. I've fixed the error handling here: commit 7a88fb3df322e4b7037fb02475679a52612e2d83 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sun May 29 19:47:19 2011 +0200 LogProtoBufferedServer: fixed error handling in case the persist-state cannot be restored In case an error happens while restoring the persist data, be sure to reset various state variables to indicate that the Protocol is to be restarted. And changed the patch slightly to limit raw_buffer_size, since in normal operation it can't be larger than init_buffer_size (which equals to log_msg_size()). This is the integrated patch: commit 0d8d2d6060c2a99bc5dd8ae399cab703a251259c Author: Peter Gyorko <gyorkop@balabit.hu> Date: Mon May 23 11:56:09 2011 +0200 [persist]: only check the buffer size if no encoding is specified If no encoding is specified, we read directly into our buffer, thus we need to verify the buffer sizes. If we do use encoding however, the raw buffer size cannot be directly compared to our own buffer size, we use init_buffer_size instead (which equals to the maximum message size). Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> Signed-off-by: Peter Gyorko <gyorkop@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> -- Bazsi
From: Peter Gyorko <gyorkop@balabit.hu> Always allocate memory for self->filename, so that it can always be freed properly. The former method of trying to figure out when to free it and when to ignore it since it's owned by something else didn't quite work, and led to memory leaks. Signed-off-by: Peter Gyorko <gyorkop@balabit.hu> Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- modules/affile/affile.c | 8 ++------ 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/modules/affile/affile.c b/modules/affile/affile.c index 6e4f5f0..c5e89f5 100644 --- a/modules/affile/affile.c +++ b/modules/affile/affile.c @@ -690,7 +690,7 @@ affile_dw_new(AFFileDestDriver *owner, GString *filename) /* we have to take care about freeing filename later. This avoids a move of the filename. */ - self->filename = filename; + self->filename = g_string_new(filename->str); g_static_mutex_init(&self->lock); return self; } @@ -1037,7 +1037,6 @@ affile_dd_open_writer(gpointer args[]) if (!next) { next = affile_dw_new(self, filename); - args[1] = NULL; if (!log_pipe_init(&next->super, cfg)) { log_pipe_unref(&next->super); @@ -1051,8 +1050,6 @@ affile_dd_open_writer(gpointer args[]) g_static_mutex_unlock(&self->lock); } } - else - g_string_free(filename, TRUE); } next->queue_pending = TRUE; /* we're returning a reference */ @@ -1115,8 +1112,7 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, const LogPathOptions *path_options, next = main_loop_call((void *(*)(void *)) affile_dd_open_writer, args, TRUE); } /* args[1] is set to NULL whenever the affile_dd_open_writer uses it for the newly constructed AFFileDestWriter instance */ - if (args[1]) - g_string_free(filename, TRUE); + g_string_free(filename, TRUE); } if (next) { -- 1.7.2.5
Hi, I generally agree with this patch, though I'd like to make one further step. AFFileDestWriter doesn't use the fact that filename is a GString, thus use a simple character pointer. I've also removed a stale comment, which was left in by the patch. NOTE: there's a possibility for optimization, as a new GString instance is allocated in the fast path for all macro expansion. This could probably be spared (using a per-thread variable, or using the value under the protection of the already acquired lock). It'd be interesting to see a profile with a macro in the filename, this allocation is almost certainly makes a difference CPU-wise (although will quite possibly perform better than a single destination if several CPUs/cores are available). Here's the updated patch: commit b9251c2603d3ffba557425387de86aa9093e5429 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sat May 28 14:14:25 2011 +0200 fixed memory leak using macros in template Always allocate memory for self->filename, so that it can always be freed properly. The former method of trying to figure out when to free it and when to ignore it since it's owned by something else didn't quite work, and led to memory leaks. This patch also changes AFFileDestWriter->filename to be a simple char pointer instead of a GString. Signed-off-by: Peter Gyorko <gyorkop@balabit.hu> Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Peter Gyorko <gyorkop@balabit.hu>
Always allocate memory for self->filename, so that it can always be freed properly. The former method of trying to figure out when to free it and when to ignore it since it's owned by something else didn't quite work, and led to memory leaks.
Signed-off-by: Peter Gyorko <gyorkop@balabit.hu> Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- modules/affile/affile.c | 8 ++------ 1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/modules/affile/affile.c b/modules/affile/affile.c index 6e4f5f0..c5e89f5 100644 --- a/modules/affile/affile.c +++ b/modules/affile/affile.c @@ -690,7 +690,7 @@ affile_dw_new(AFFileDestDriver *owner, GString *filename)
/* we have to take care about freeing filename later. This avoids a move of the filename. */ - self->filename = filename; + self->filename = g_string_new(filename->str); g_static_mutex_init(&self->lock); return self; } @@ -1037,7 +1037,6 @@ affile_dd_open_writer(gpointer args[]) if (!next) { next = affile_dw_new(self, filename); - args[1] = NULL; if (!log_pipe_init(&next->super, cfg)) { log_pipe_unref(&next->super); @@ -1051,8 +1050,6 @@ affile_dd_open_writer(gpointer args[]) g_static_mutex_unlock(&self->lock); } } - else - g_string_free(filename, TRUE); } next->queue_pending = TRUE; /* we're returning a reference */ @@ -1115,8 +1112,7 @@ affile_dd_queue(LogPipe *s, LogMessage *msg, const LogPathOptions *path_options, next = main_loop_call((void *(*)(void *)) affile_dd_open_writer, args, TRUE); } /* args[1] is set to NULL whenever the affile_dd_open_writer uses it for the newly constructed AFFileDestWriter instance */ - if (args[1]) - g_string_free(filename, TRUE); + g_string_free(filename, TRUE); } if (next) {
-- Bazsi
From: NagyAttila <naat@balabit.hu> Do not decrease the stored counter in log_queue_fifo_pop_head(), and initialize the counters properly in afsql_dd_init(): the stored/dropped counters were swapped. Signed-off-by: Attila Nagy <naat@balabit.hu> --- lib/logqueue-fifo.c | 3 --- modules/afsql/afsql.c | 2 +- 2 files changed, 1 insertions(+), 4 deletions(-) diff --git a/lib/logqueue-fifo.c b/lib/logqueue-fifo.c index 1a34776..f78f00c 100644 --- a/lib/logqueue-fifo.c +++ b/lib/logqueue-fifo.c @@ -326,9 +326,6 @@ log_queue_fifo_pop_head(LogQueue *s, LogMessage **msg, LogPathOptions *path_opti */ return FALSE; } - g_static_mutex_lock(&self->super.lock); - stats_counter_dec(self->super.stored_messages); - g_static_mutex_unlock(&self->super.lock); if (push_to_backlog) { diff --git a/modules/afsql/afsql.c b/modules/afsql/afsql.c index d7ac36e..a9f0d95 100644 --- a/modules/afsql/afsql.c +++ b/modules/afsql/afsql.c @@ -951,7 +951,7 @@ afsql_dd_init(LogPipe *s) stats_register_counter(0, SCS_SQL | SCS_DESTINATION, self->super.super.id, afsql_dd_format_stats_instance(self), SC_TYPE_DROPPED, &self->dropped_messages); self->queue = log_dest_driver_acquire_queue(&self->super, afsql_dd_format_persist_name(self)); - log_queue_set_counters(self->queue, self->dropped_messages, self->stored_messages); + log_queue_set_counters(self->queue, self->stored_messages, self->dropped_messages); if (!self->fields) { GList *col, *value; -- 1.7.2.5
Hi, On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: NagyAttila <naat@balabit.hu>
Do not decrease the stored counter in log_queue_fifo_pop_head(), and initialize the counters properly in afsql_dd_init(): the stored/dropped counters were swapped.
Signed-off-by: Attila Nagy <naat@balabit.hu> --- lib/logqueue-fifo.c | 3 --- modules/afsql/afsql.c | 2 +- 2 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/lib/logqueue-fifo.c b/lib/logqueue-fifo.c index 1a34776..f78f00c 100644 --- a/lib/logqueue-fifo.c +++ b/lib/logqueue-fifo.c @@ -326,9 +326,6 @@ log_queue_fifo_pop_head(LogQueue *s, LogMessage **msg, LogPathOptions *path_opti */ return FALSE; } - g_static_mutex_lock(&self->super.lock); - stats_counter_dec(self->super.stored_messages); - g_static_mutex_unlock(&self->super.lock);
if (push_to_backlog) {
I'm somewhat puzzled here. The queue manages the "stored" counter itself, the increments remained there in all paths (e.g. push_tail, rewind, etc). What is going to perform the decrease operation then?
diff --git a/modules/afsql/afsql.c b/modules/afsql/afsql.c index d7ac36e..a9f0d95 100644 --- a/modules/afsql/afsql.c +++ b/modules/afsql/afsql.c @@ -951,7 +951,7 @@ afsql_dd_init(LogPipe *s) stats_register_counter(0, SCS_SQL | SCS_DESTINATION, self->super.super.id, afsql_dd_format_stats_instance(self), SC_TYPE_DROPPED, &self->dropped_messages);
self->queue = log_dest_driver_acquire_queue(&self->super, afsql_dd_format_persist_name(self)); - log_queue_set_counters(self->queue, self->dropped_messages, self->stored_messages); + log_queue_set_counters(self->queue, self->stored_messages, self->dropped_messages); if (!self->fields) { GList *col, *value;
This part is OK, and since it is completely independent of the other hunk, I've applied it separately with this patch: commit 966b39289785677d0c4354623f1f83780b952ee5 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sat May 28 14:17:58 2011 +0200 fix db stored/dropped messages counter handling Initialize the counters properly in afsql_dd_init(): the stored/dropped counters were swapped. Signed-off-by: Attila Nagy <naat@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: NagyAttila <naat@balabit.hu>
Do not decrease the stored counter in log_queue_fifo_pop_head(), and initialize the counters properly in afsql_dd_init(): the stored/dropped counters were swapped.
Signed-off-by: Attila Nagy <naat@balabit.hu> --- lib/logqueue-fifo.c | 3 --- modules/afsql/afsql.c | 2 +- 2 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/lib/logqueue-fifo.c b/lib/logqueue-fifo.c index 1a34776..f78f00c 100644 --- a/lib/logqueue-fifo.c +++ b/lib/logqueue-fifo.c @@ -326,9 +326,6 @@ log_queue_fifo_pop_head(LogQueue *s, LogMessage **msg, LogPathOptions *path_opti */ return FALSE; } - g_static_mutex_lock(&self->super.lock); - stats_counter_dec(self->super.stored_messages); - g_static_mutex_unlock(&self->super.lock);
if (push_to_backlog) {
I'm somewhat puzzled here. The queue manages the "stored" counter itself, the increments remained there in all paths (e.g. push_tail, rewind, etc). What is going to perform the decrease operation then?
Looks like I either missed a patch when preparing these, or I horribly misunderstood the team's intent. I'll look into it, and see if I can figure out what's this hunk is about. -- |8]
From: Juhasz Viktor <jviktor@balabit.hu> Duplicate the filename we insert into self->writer_hash, because otherwise, we're likely to end up trying to free it twice. Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- modules/affile/affile.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/modules/affile/affile.c b/modules/affile/affile.c index c5e89f5..f6c70e9 100644 --- a/modules/affile/affile.c +++ b/modules/affile/affile.c @@ -1046,7 +1046,7 @@ affile_dd_open_writer(gpointer args[]) { log_pipe_ref(&next->super); g_static_mutex_lock(&self->lock); - g_hash_table_insert(self->writer_hash, filename->str, next); + g_hash_table_insert(self->writer_hash, strdup(filename->str), next); g_static_mutex_unlock(&self->lock); } } -- 1.7.2.5
Hi, On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Juhasz Viktor <jviktor@balabit.hu>
Duplicate the filename we insert into self->writer_hash, because otherwise, we're likely to end up trying to free it twice.
Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- modules/affile/affile.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/modules/affile/affile.c b/modules/affile/affile.c index c5e89f5..f6c70e9 100644 --- a/modules/affile/affile.c +++ b/modules/affile/affile.c @@ -1046,7 +1046,7 @@ affile_dd_open_writer(gpointer args[]) { log_pipe_ref(&next->super); g_static_mutex_lock(&self->lock); - g_hash_table_insert(self->writer_hash, filename->str, next); + g_hash_table_insert(self->writer_hash, strdup(filename->str), next); g_static_mutex_unlock(&self->lock); } }
I don't understand. The writer_hash hashtable contains a borrowed key, nothing frees it there. Isn't it possible that the last patch (which changed AFFileDestWriter->filename allocation) fixed this issue too? And even if the strdup was necessary, g_strdup() should have been used, as the g_malloc() style allocations may not map directly to malloc() on some platforms. -- Bazsi
From: Juhasz Viktor <jviktor@balabit.hu>
Duplicate the filename we insert into self->writer_hash, because otherwise, we're likely to end up trying to free it twice.
Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- modules/affile/affile.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/modules/affile/affile.c b/modules/affile/affile.c index c5e89f5..f6c70e9 100644 --- a/modules/affile/affile.c +++ b/modules/affile/affile.c @@ -1046,7 +1046,7 @@ affile_dd_open_writer(gpointer args[]) { log_pipe_ref(&next->super); g_static_mutex_lock(&self->lock); - g_hash_table_insert(self->writer_hash, filename->str, next); + g_hash_table_insert(self->writer_hash, strdup(filename->str), next); g_static_mutex_unlock(&self->lock); } }
I don't understand. The writer_hash hashtable contains a borrowed key, nothing frees it there. Isn't it possible that the last patch (which changed AFFileDestWriter->filename allocation) fixed this issue too?
That's very possible, indeed. After having a closer look, I believe that the other patch fixed this issue aswell.
And even if the strdup was necessary, g_strdup() should have been used, as the g_malloc() style allocations may not map directly to malloc() on some platforms.
Yikes, my bad! I didn't spot it's not g_strdup(). -- |8]
From: Juhasz Viktor <jviktor@balabit.hu> Since nv_registry_alloc_handle() can be indirectly called from multiple threads, when it modifies the registry, that needs to be protected with a mutex. Same goes for nv_registry_add_alias(). Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- lib/nvtable.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/lib/nvtable.c b/lib/nvtable.c index aa9d8a1..d0a35d3 100644 --- a/lib/nvtable.c +++ b/lib/nvtable.c @@ -27,6 +27,8 @@ #include <string.h> #include <stdlib.h> +GStaticMutex nv_registry_lock = G_STATIC_MUTEX_INIT; + const gchar *null_string = ""; NVHandle @@ -78,8 +80,10 @@ nv_registry_alloc_handle(NVRegistry *self, const gchar *name) stored.flags = 0; stored.name_len = len; stored.name = g_strdup(name); + g_static_mutex_lock(&nv_registry_lock); g_array_append_val(self->names, stored); g_hash_table_insert(self->name_map, stored.name, GUINT_TO_POINTER(self->names->len)); + g_static_mutex_unlock(&nv_registry_lock); return self->names->len; } @@ -91,7 +95,9 @@ nv_registry_alloc_handle(NVRegistry *self, const gchar *name) void nv_registry_add_alias(NVRegistry *self, NVHandle handle, const gchar *alias) { + g_static_mutex_lock(&nv_registry_lock); g_hash_table_insert(self->name_map, (gchar *) alias, GUINT_TO_POINTER((glong) handle)); + g_static_mutex_unlock(&nv_registry_lock); } void -- 1.7.2.5
Hi, This locking is certainly necessary, however the scope is not enough in nv_registry_alloc_handle(). The lookup must also be locked, since the same name can be associated to a different value otherwise. It should also be checked if this would happen on a fastpath though. On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Juhasz Viktor <jviktor@balabit.hu>
Since nv_registry_alloc_handle() can be indirectly called from multiple threads, when it modifies the registry, that needs to be protected with a mutex.
Same goes for nv_registry_add_alias().
Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- lib/nvtable.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/lib/nvtable.c b/lib/nvtable.c index aa9d8a1..d0a35d3 100644 --- a/lib/nvtable.c +++ b/lib/nvtable.c @@ -27,6 +27,8 @@ #include <string.h> #include <stdlib.h>
+GStaticMutex nv_registry_lock = G_STATIC_MUTEX_INIT; + const gchar *null_string = "";
NVHandle @@ -78,8 +80,10 @@ nv_registry_alloc_handle(NVRegistry *self, const gchar *name) stored.flags = 0; stored.name_len = len; stored.name = g_strdup(name); + g_static_mutex_lock(&nv_registry_lock); g_array_append_val(self->names, stored); g_hash_table_insert(self->name_map, stored.name, GUINT_TO_POINTER(self->names->len)); + g_static_mutex_unlock(&nv_registry_lock); return self->names->len; }
@@ -91,7 +95,9 @@ nv_registry_alloc_handle(NVRegistry *self, const gchar *name) void nv_registry_add_alias(NVRegistry *self, NVHandle handle, const gchar *alias) { + g_static_mutex_lock(&nv_registry_lock); g_hash_table_insert(self->name_map, (gchar *) alias, GUINT_TO_POINTER((glong) handle)); + g_static_mutex_unlock(&nv_registry_lock); }
void
-- Bazsi
Hi, The follow-up patch to fix this was pushed to master today: commit abe191ed272c85f7bceb121aefeb8d8f4c7a9ed4 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Thu Jul 14 12:28:12 2011 +0200 nvtable: fixed locking of the nv_register hash table. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> On Sat, 2011-05-28 at 15:56 +0200, Balazs Scheidler wrote:
Hi,
This locking is certainly necessary, however the scope is not enough in nv_registry_alloc_handle(). The lookup must also be locked, since the same name can be associated to a different value otherwise.
It should also be checked if this would happen on a fastpath though.
On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Juhasz Viktor <jviktor@balabit.hu>
Since nv_registry_alloc_handle() can be indirectly called from multiple threads, when it modifies the registry, that needs to be protected with a mutex.
Same goes for nv_registry_add_alias().
Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- lib/nvtable.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/lib/nvtable.c b/lib/nvtable.c index aa9d8a1..d0a35d3 100644 --- a/lib/nvtable.c +++ b/lib/nvtable.c @@ -27,6 +27,8 @@ #include <string.h> #include <stdlib.h>
+GStaticMutex nv_registry_lock = G_STATIC_MUTEX_INIT; + const gchar *null_string = "";
NVHandle @@ -78,8 +80,10 @@ nv_registry_alloc_handle(NVRegistry *self, const gchar *name) stored.flags = 0; stored.name_len = len; stored.name = g_strdup(name); + g_static_mutex_lock(&nv_registry_lock); g_array_append_val(self->names, stored); g_hash_table_insert(self->name_map, stored.name, GUINT_TO_POINTER(self->names->len)); + g_static_mutex_unlock(&nv_registry_lock); return self->names->len; }
@@ -91,7 +95,9 @@ nv_registry_alloc_handle(NVRegistry *self, const gchar *name) void nv_registry_add_alias(NVRegistry *self, NVHandle handle, const gchar *alias) { + g_static_mutex_lock(&nv_registry_lock); g_hash_table_insert(self->name_map, (gchar *) alias, GUINT_TO_POINTER((glong) handle)); + g_static_mutex_unlock(&nv_registry_lock); }
void
-- Bazsi
From: Tamas Pal <folti@balabit.hu> Force using OS's own uname implementation, over whatewer found in PATH. Die with error on an unsupported system or uname error. Signed-off-by: Tamas Pal <folti@balabit.hu> --- scl/system/generate-system-source.sh | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/scl/system/generate-system-source.sh b/scl/system/generate-system-source.sh index d89fd33..6f5af70 100755 --- a/scl/system/generate-system-source.sh +++ b/scl/system/generate-system-source.sh @@ -23,6 +23,14 @@ # ############################################################################# +# DO NOT REMOVE!!! +# We have to force the script to use the OS's own utilities, instead of some +# random stuff found in path. This is needed when PATH points to a uname binary +# with some missing dependencies, due to LD_LIBRARY_PATH/LIBPATH settings. In those case, it's possible, that uname doesn't work... +PATH=/bin:/usr/bin:$PATH +LIBPATH= +LD_LIBRARY_PATH= +export PATH LIBPATH LD_LIBRARY_PATH os=${UNAME_S:-`uname -s`} osversion=${UNAME_R:-`uname -r`} @@ -69,4 +77,9 @@ EOF unix-dgram("/dev/log"); EOF ;; + *) + # need to notify the user that something gone terribly wrong... + echo "$0: FATAL: unsupported OS ($os) or uname(1) error. Please call BalaBit's support" >&2 + exit 1 + ;; esac -- 1.7.2.5
On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Tamas Pal <folti@balabit.hu>
Force using OS's own uname implementation, over whatewer found in PATH. Die with error on an unsupported system or uname error.
Signed-off-by: Tamas Pal <folti@balabit.hu> --- scl/system/generate-system-source.sh | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/scl/system/generate-system-source.sh b/scl/system/generate-system-source.sh index d89fd33..6f5af70 100755 --- a/scl/system/generate-system-source.sh +++ b/scl/system/generate-system-source.sh @@ -23,6 +23,14 @@ # #############################################################################
+# DO NOT REMOVE!!! +# We have to force the script to use the OS's own utilities, instead of some +# random stuff found in path. This is needed when PATH points to a uname binary +# with some missing dependencies, due to LD_LIBRARY_PATH/LIBPATH settings. In those case, it's possible, that uname doesn't work... +PATH=/bin:/usr/bin:$PATH +LIBPATH= +LD_LIBRARY_PATH= +export PATH LIBPATH LD_LIBRARY_PATH
os=${UNAME_S:-`uname -s`} osversion=${UNAME_R:-`uname -r`}
Hmm... I really don't get this, what if the admin really want to change the uname binary? Can you explain when this is needed?
+ *) + # need to notify the user that something gone terribly wrong... + echo "$0: FATAL: unsupported OS ($os) or uname(1) error. Please call BalaBit's support" >&2 + exit 1 + ;;
hmm... most syslog-ng users would probably not have a support policy in place, so the error message should not advise that. If you have information on the first question, I'll integrate the patch with a slight change in the message, so no need for an updated patch. -- Bazsi
Balazs Scheidler <bazsi@balabit.hu> writes:
+# DO NOT REMOVE!!! +# We have to force the script to use the OS's own utilities, instead of some +# random stuff found in path. This is needed when PATH points to a uname binary +# with some missing dependencies, due to LD_LIBRARY_PATH/LIBPATH settings. In those case, it's possible, that uname doesn't work... +PATH=/bin:/usr/bin:$PATH +LIBPATH= +LD_LIBRARY_PATH= +export PATH LIBPATH LD_LIBRARY_PATH
os=${UNAME_S:-`uname -s`} osversion=${UNAME_R:-`uname -r`}
Hmm... I really don't get this, what if the admin really want to change the uname binary? Can you explain when this is needed?
It was changed so that uname would be hard to override, and thus break syslog-ng's system(). All systems I could think of quickly, had a suitable uname in /bin or /usr/bin. I do not really think there would be any reason to change the uname binary, but one can still replace it in /usr/bin - or change the SCL-called script to not override the PATH. Basically, this is for the safety of users. -- |8]
On Thu, Jun 02, 2011 at 08:49:05PM +0200, Balazs Scheidler wrote:
On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Tamas Pal <folti@balabit.hu>
Force using OS's own uname implementation, over whatewer found in PATH. Die with error on an unsupported system or uname error.
Signed-off-by: Tamas Pal <folti@balabit.hu> --- scl/system/generate-system-source.sh | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/scl/system/generate-system-source.sh b/scl/system/generate-system-source.sh index d89fd33..6f5af70 100755 --- a/scl/system/generate-system-source.sh +++ b/scl/system/generate-system-source.sh @@ -23,6 +23,14 @@ # #############################################################################
+# DO NOT REMOVE!!! +# We have to force the script to use the OS's own utilities, instead of some +# random stuff found in path. This is needed when PATH points to a uname binary +# with some missing dependencies, due to LD_LIBRARY_PATH/LIBPATH settings. In those case, it's possible, that uname doesn't work... +PATH=/bin:/usr/bin:$PATH +LIBPATH= +LD_LIBRARY_PATH= +export PATH LIBPATH LD_LIBRARY_PATH
os=${UNAME_S:-`uname -s`} osversion=${UNAME_R:-`uname -r`}
Hmm... I really don't get this, what if the admin really want to change the uname binary? Can you explain when this is needed? It was an AIX specific bug. On the test system, there are another uname from GNU binutils under /opt/freeware/bin which depends on their own libintl library located in /opt/freeware/lib. Due to AIX's own shared library handling(see below: [1]), this conflicted with our libintl implementation, rendering the uname binary inoperable.
Because the script didn't handle the case when uname returns nothing or unsupported OS info, it returned neither system log sources nor warning to the user(other than catching the dynamic linker's error message on console). Other systems have more sane dynamic linkers and shared library handling policies, so it wouldn't cause the same problem, unless the system in question really messed up, but we can't do much about that. If your system's uname is broken (located in /bin or in case of UNIXes, /usr/bin - where /bin is a symlink to /usr/bin), then you have bigger problems than syslog-ng not working. Other possible future problems, if the system have some AppArmor or SELinux like security suite installed, which would prevent executing binaries in non-standard paths, unless configured otherwise. This can be a problem for SCL's own scripts too. [1]: AIX shared library handling: On AIX the .so files themselves are packaged into .a archives and the dynamic linker's default behaviour is to look for them in there. In this case, the GNU uname depended on libintl.so.1 in a file libintl.a. Due to the limitations of the dynamic linker, it only looks into the first <libname>.a file found on path, if that one doesn't contain the needed .so it fails. In that case, the first libintl.a it found (due to inheriting LIBPATH from syslog-ng) was our own in /opt/syslog-ng/lib containing only a libintl.so.8.
+ *) + # need to notify the user that something gone terribly wrong... + echo "$0: FATAL: unsupported OS ($os) or uname(1) error. Please call BalaBit's support" >&2 + exit 1 + ;;
hmm... most syslog-ng users would probably not have a support policy in place, so the error message should not advise that.
Umm yes, I only the PE version in mind when I write that message.
If you have information on the first question, I'll integrate the patch with a slight change in the message, so no need for an updated patch.
-- Pal Tamas/Folti folti@balabit.hu
On Fri, 2011-06-03 at 10:14 +0200, Pal Tamas wrote:
On Thu, Jun 02, 2011 at 08:49:05PM +0200, Balazs Scheidler wrote:
On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Tamas Pal <folti@balabit.hu>
Force using OS's own uname implementation, over whatewer found in PATH. Die with error on an unsupported system or uname error.
Signed-off-by: Tamas Pal <folti@balabit.hu> --- scl/system/generate-system-source.sh | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/scl/system/generate-system-source.sh b/scl/system/generate-system-source.sh index d89fd33..6f5af70 100755 --- a/scl/system/generate-system-source.sh +++ b/scl/system/generate-system-source.sh @@ -23,6 +23,14 @@ # #############################################################################
+# DO NOT REMOVE!!! +# We have to force the script to use the OS's own utilities, instead of some +# random stuff found in path. This is needed when PATH points to a uname binary +# with some missing dependencies, due to LD_LIBRARY_PATH/LIBPATH settings. In those case, it's possible, that uname doesn't work... +PATH=/bin:/usr/bin:$PATH +LIBPATH= +LD_LIBRARY_PATH= +export PATH LIBPATH LD_LIBRARY_PATH
os=${UNAME_S:-`uname -s`} osversion=${UNAME_R:-`uname -r`}
Hmm... I really don't get this, what if the admin really want to change the uname binary? Can you explain when this is needed? It was an AIX specific bug. On the test system, there are another uname from GNU binutils under /opt/freeware/bin which depends on their own libintl library located in /opt/freeware/lib. Due to AIX's own shared library handling(see below: [1]), this conflicted with our libintl implementation, rendering the uname binary inoperable.
Because the script didn't handle the case when uname returns nothing or unsupported OS info, it returned neither system log sources nor warning to the user(other than catching the dynamic linker's error message on console).
Other systems have more sane dynamic linkers and shared library handling policies, so it wouldn't cause the same problem, unless the system in question really messed up, but we can't do much about that. If your system's uname is broken (located in /bin or in case of UNIXes, /usr/bin - where /bin is a symlink to /usr/bin), then you have bigger problems than syslog-ng not working.
Other possible future problems, if the system have some AppArmor or SELinux like security suite installed, which would prevent executing binaries in non-standard paths, unless configured otherwise. This can be a problem for SCL's own scripts too.
[1]: AIX shared library handling:
On AIX the .so files themselves are packaged into .a archives and the dynamic linker's default behaviour is to look for them in there. In this case, the GNU uname depended on libintl.so.1 in a file libintl.a. Due to the limitations of the dynamic linker, it only looks into the first <libname>.a file found on path, if that one doesn't contain the needed .so it fails. In that case, the first libintl.a it found (due to inheriting LIBPATH from syslog-ng) was our own in /opt/syslog-ng/lib containing only a libintl.so.8.
+ *) + # need to notify the user that something gone terribly wrong... + echo "$0: FATAL: unsupported OS ($os) or uname(1) error. Please call BalaBit's support" >&2 + exit 1 + ;;
hmm... most syslog-ng users would probably not have a support policy in place, so the error message should not advise that.
Umm yes, I only the PE version in mind when I write that message.
If you have information on the first question, I'll integrate the patch with a slight change in the message, so no need for an updated patch.
A better solution would be to remove the syslog-ng specific portion of LD_LIBRARY_PATH env variable before executing user-supplied programs (confgen, program source & destination). The issue is that it's not easy to identify these points, and sometimes the exec() call itself is buried under several layers (popen for instance). This issue only happens if the env wrapper is enabled (--enable-env-wrapper), which is needed to run syslog-ng in /opt/syslog-ng and is used by both the PE and OSE binaries that we build. Hmm... I think I'll just apply the patch, fixing the error message. Thanks guys. Here's the patch I've committed: commit b7d46bc094215fa80f93e0dc46a3caa83e8a24b8 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Sat Jun 4 12:54:07 2011 +0200 Bail out on unknown systems, and use a clean environment. Force using OS's own uname implementation, over whatewer found in PATH. Die with error on an unsupported system or uname error. Signed-off-by: Tamas Pal <folti@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> -- Bazsi
From: Juhasz Viktor <jviktor@balabit.hu> Rearrange how self->super.wakeup is handled, in order to stop a race condition where it might be called earlier than it is set, or after it should have been cleared. If the reader is uninitialized, but the writer tries to flush the queue, and call log_msg_refcache_stop(), that in turn will call the reader's wakeup function. Since the reader is uninitialized, this can easily corrupt the ivykis event list. To avoid this scenario, the wakeup callback should be set differently, and cleared early. Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- lib/logreader.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/lib/logreader.c b/lib/logreader.c index 268be6e..4e56d05 100644 --- a/lib/logreader.c +++ b/lib/logreader.c @@ -577,7 +577,8 @@ static gboolean log_reader_init(LogPipe *s) { LogReader *self = (LogReader *) s; - + /* If reader is initialized than log_msg_ref_cache_stop can call the wakeup function */ + self->super.wakeup = log_reader_wakeup; if (!log_source_init(s)) return FALSE; /* check for new data */ @@ -612,7 +613,6 @@ log_reader_init(LogPipe *s) if (!log_reader_start_watches(self)) return FALSE; iv_event_register(&self->schedule_wakeup); - return TRUE; } @@ -622,7 +622,14 @@ log_reader_deinit(LogPipe *s) LogReader *self = (LogReader *) s; main_loop_assert_main_thread(); - + /* + the wakeup function must be set to NULL, because it can be occured than + the reader is uninitialized but the writer flush the queue and call the log_msg_refcache_stop and + log_msg_refcache_stop call the wakeup function if it is not null, but in this case the reader + is uninitialized and the call of iv_event_post for the unregistered event can cause an ivykis list corruption. + IF logreader is uninitialized than the wakeup function is NULL so the iv_event_post isn't called. + */ + self->super.wakeup = NULL; iv_event_unregister(&self->schedule_wakeup); log_reader_stop_watches(self); if (!log_source_deinit(s)) @@ -684,7 +691,8 @@ log_reader_new(LogProto *proto) self->super.super.init = log_reader_init; self->super.super.deinit = log_reader_deinit; self->super.super.free_fn = log_reader_free; - self->super.wakeup = log_reader_wakeup; + /* Set wakeup function to NULL because this is set in the log_reader_init and unset in the log_reader_deinit */ + self->super.wakeup = NULL; self->proto = proto; self->immediate_check = FALSE; self->pollable_state = -1; -- 1.7.2.5
Hi, I've changed this patch almost completely, but the diagnosis of the possible bug was right, I just think I've found a simpler implementation. Here's the patch: commit 694164b1ea6f575424413b3b1f359af09457db16 Author: Balazs Scheidler <bazsi@balabit.hu> Date: Thu Jun 9 14:54:07 2011 +0200 LogReader: Proper wakeup function handling for deinitialized LogReaders The wakeup() callback can be called even if a LogReader is deinitialized (when the Writer is deinitialized _after_ the reader, and some messages are flushed out in the deinit path), in which case we shouldn't trigger the schedule_wakeup event. The original patch by Viktor Juhasz, but was completely reworked by Balazs Scheidler. Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> On Mon, 2011-05-23 at 11:56 +0200, Gergely Nagy wrote:
From: Juhasz Viktor <jviktor@balabit.hu>
Rearrange how self->super.wakeup is handled, in order to stop a race condition where it might be called earlier than it is set, or after it should have been cleared.
If the reader is uninitialized, but the writer tries to flush the queue, and call log_msg_refcache_stop(), that in turn will call the reader's wakeup function. Since the reader is uninitialized, this can easily corrupt the ivykis event list.
To avoid this scenario, the wakeup callback should be set differently, and cleared early.
Signed-off-by: Viktor Juhasz <jviktor@balabit.hu> --- lib/logreader.c | 16 ++++++++++++---- 1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/lib/logreader.c b/lib/logreader.c index 268be6e..4e56d05 100644 --- a/lib/logreader.c +++ b/lib/logreader.c @@ -577,7 +577,8 @@ static gboolean log_reader_init(LogPipe *s) { LogReader *self = (LogReader *) s; - + /* If reader is initialized than log_msg_ref_cache_stop can call the wakeup function */ + self->super.wakeup = log_reader_wakeup; if (!log_source_init(s)) return FALSE; /* check for new data */ @@ -612,7 +613,6 @@ log_reader_init(LogPipe *s) if (!log_reader_start_watches(self)) return FALSE; iv_event_register(&self->schedule_wakeup); - return TRUE; }
@@ -622,7 +622,14 @@ log_reader_deinit(LogPipe *s) LogReader *self = (LogReader *) s;
main_loop_assert_main_thread(); - + /* + the wakeup function must be set to NULL, because it can be occured than + the reader is uninitialized but the writer flush the queue and call the log_msg_refcache_stop and + log_msg_refcache_stop call the wakeup function if it is not null, but in this case the reader + is uninitialized and the call of iv_event_post for the unregistered event can cause an ivykis list corruption. + IF logreader is uninitialized than the wakeup function is NULL so the iv_event_post isn't called. + */ + self->super.wakeup = NULL; iv_event_unregister(&self->schedule_wakeup); log_reader_stop_watches(self); if (!log_source_deinit(s)) @@ -684,7 +691,8 @@ log_reader_new(LogProto *proto) self->super.super.init = log_reader_init; self->super.super.deinit = log_reader_deinit; self->super.super.free_fn = log_reader_free; - self->super.wakeup = log_reader_wakeup; + /* Set wakeup function to NULL because this is set in the log_reader_init and unset in the log_reader_deinit */ + self->super.wakeup = NULL; self->proto = proto; self->immediate_check = FALSE; self->pollable_state = -1;
-- Bazsi
participants (3)
-
Balazs Scheidler
-
Gergely Nagy
-
Pal Tamas