GeoIP template function
Hi, I have a system where I have a lot of logs with IP addresses (typically firewall logs), and the requirement was to also include the country name in the logs. I found that a geoip based template function would be an easy way for this, so I created a PoC for that. It works pretty well, however some improvements and performance measurement would be necessary. The created patch is attached (based on 3.3.5), some may find it useful :) (It requires libgeoip, and a geoip database, of course) Regards, Csaba
Hi! Csaba Major <csaba.major@balabit.com> writes:
I have a system where I have a lot of logs with IP addresses (typically firewall logs), and the requirement was to also include the country name in the logs. I found that a geoip based template function would be an easy way for this, so I created a PoC for that.
This sounds very interesting, thanks for the PoC!
The created patch is attached (based on 3.3.5), some may find it useful :) (It requires libgeoip, and a geoip database, of course)
See my comments below. The idea is wonderful, and I'll prep up an improved version for 3.4. Since you want it for 3.3, I'll also create a patch for that branch too, but it's 3.4 where it can be merged, if Bazsi likes the idea too.
--- /dev/null +++ b/modules/tfgeoip/Makefile.am @@ -0,0 +1,11 @@ +moduledir = @moduledir@ +export top_srcdir + +#if ENABLE_UUID +AM_CPPFLAGS = -I$(top_srcdir)/lib -I../../lib $(GEOIP_CFLAGS) +module_LTLIBRARIES = libtfgeoip.la + +libtfgeoip_la_SOURCES = tfgeoip.c +libtfgeoip_la_LIBADD = $(MODULE_DEPS_LIBS) $(GEOIP_LIBS) +libtfgeoip_la_LDFLAGS = $(MODULE_LDFLAGS) +#endif
I assume s/UUID/GEOIP/ :)
diff --git a/modules/tfgeoip/tfgeoip.c b/modules/tfgeoip/tfgeoip.c new file mode 100644 index 0000000..eeea330 --- /dev/null +++ b/modules/tfgeoip/tfgeoip.c [...] +static void +tf_geoip(LogMessage *msg, gint argc, GString *argv[], GString *result) +{ + GeoIP * gi; + const char * returnedCountry; + + if (argc != 1) + return; + gi = GeoIP_open("/usr/share/GeoIP/GeoIP.dat", GEOIP_STANDARD); + + returnedCountry = GeoIP_country_code_by_addr(gi, argv[0]->str); + g_string_append_printf (result, "%s", returnedCountry); +}
As you mentioned in person, it would be nice if the template function could be configured, preferably globally, so that one wouldn't have to repeat the same parameters over and over again in every template where it would be used. Things like returning country vs country code, the location of the database and so on and so forth should be configurable. The best would be to have a geoip() option in the global option space, but I'm not sure a plugin can hook into that. I suppose the first thing would be to add commandline handling to tf_geoip first, so at least it's configurable. Then figure out a way to make global configuration possible. I have no idea whether this latter is possible with current syslog-ng 3.4, but this is a great opportunity to make it possible to do this. Unless you want to work on this yourself further, I'll spice it up with error handling and basic configurability in the next few days. Come to think of it, perhaps it would make sense to introduce a new rewrite command, something along the lines of: rewrite r_geoip { apply(geoip("SRCIP"), "GEOIP"); }; This would apply the "geoip" function to the value of the "SRCIP" key, and place the result in ${GEOIP}. The advantage of this is that it's perhaps faster to do this than parsing a template function and using g_string_append_printf to reassemble another string. Similarly, this same apply() thing could be used to permanently modify the name-value pairs: similar to what value-pairs() does, but instead of returning a different, entirely independent set of name-value pairs, we could apply the same transformations and filtering within a rewrite rule: rewrite r_vp { apply(value-pairs("*" rekey(subst(".", "_")))); }; This would take all keys, and replace leading dots in key names with an underscore instead, and it would modify the stuff in-place, so that anything that works with the message from this point onward, would see the rewritten names. In essence, apply() would have a syntax like apply(FUNCTION(...)[, OUTPUT_VARIABLE]) The function could either modify the LogMessage object itself, or return a value. Functions would need to signal which one they do, and the config parser would not allow using a function with an output value without specifying a destination, etc. ...but I guess I'll draft up an RFC about this instead with a few more possible use cases, advantages & disadvantages and the rest. -- |8]
I'm confused as to how a geoip function will help my Windows clients created their directories based off their hostnames. All my clients reside in the same network/environment and it's not all clients, just the Windows ones. -----Original Message----- From: syslog-ng-bounces@lists.balabit.hu [mailto:syslog-ng-bounces@lists.balabit.hu] On Behalf Of Gergely Nagy Sent: Wednesday, May 09, 2012 9:14 AM To: Syslog-ng users' and developers' mailing list Subject: Re: [syslog-ng] GeoIP template function Hi! Csaba Major <csaba.major@balabit.com> writes:
I have a system where I have a lot of logs with IP addresses (typically firewall logs), and the requirement was to also include the country name in the logs. I found that a geoip based template function would be an easy way for this, so I created a PoC for that.
This sounds very interesting, thanks for the PoC!
The created patch is attached (based on 3.3.5), some may find it useful :) (It requires libgeoip, and a geoip database, of course)
See my comments below. The idea is wonderful, and I'll prep up an improved version for 3.4. Since you want it for 3.3, I'll also create a patch for that branch too, but it's 3.4 where it can be merged, if Bazsi likes the idea too.
--- /dev/null +++ b/modules/tfgeoip/Makefile.am @@ -0,0 +1,11 @@ +moduledir = @moduledir@ +export top_srcdir + +#if ENABLE_UUID +AM_CPPFLAGS = -I$(top_srcdir)/lib -I../../lib $(GEOIP_CFLAGS) +module_LTLIBRARIES = libtfgeoip.la + +libtfgeoip_la_SOURCES = tfgeoip.c +libtfgeoip_la_LIBADD = $(MODULE_DEPS_LIBS) $(GEOIP_LIBS) +libtfgeoip_la_LDFLAGS = $(MODULE_LDFLAGS) #endif
I assume s/UUID/GEOIP/ :)
diff --git a/modules/tfgeoip/tfgeoip.c b/modules/tfgeoip/tfgeoip.c new file mode 100644 index 0000000..eeea330 --- /dev/null +++ b/modules/tfgeoip/tfgeoip.c [...] +static void +tf_geoip(LogMessage *msg, gint argc, GString *argv[], GString +*result) { + GeoIP * gi; + const char * returnedCountry; + + if (argc != 1) + return; + gi = GeoIP_open("/usr/share/GeoIP/GeoIP.dat", GEOIP_STANDARD); + + returnedCountry = GeoIP_country_code_by_addr(gi, argv[0]->str); + g_string_append_printf (result, "%s", returnedCountry); }
As you mentioned in person, it would be nice if the template function could be configured, preferably globally, so that one wouldn't have to repeat the same parameters over and over again in every template where it would be used. Things like returning country vs country code, the location of the database and so on and so forth should be configurable. The best would be to have a geoip() option in the global option space, but I'm not sure a plugin can hook into that. I suppose the first thing would be to add commandline handling to tf_geoip first, so at least it's configurable. Then figure out a way to make global configuration possible. I have no idea whether this latter is possible with current syslog-ng 3.4, but this is a great opportunity to make it possible to do this. Unless you want to work on this yourself further, I'll spice it up with error handling and basic configurability in the next few days. Come to think of it, perhaps it would make sense to introduce a new rewrite command, something along the lines of: rewrite r_geoip { apply(geoip("SRCIP"), "GEOIP"); }; This would apply the "geoip" function to the value of the "SRCIP" key, and place the result in ${GEOIP}. The advantage of this is that it's perhaps faster to do this than parsing a template function and using g_string_append_printf to reassemble another string. Similarly, this same apply() thing could be used to permanently modify the name-value pairs: similar to what value-pairs() does, but instead of returning a different, entirely independent set of name-value pairs, we could apply the same transformations and filtering within a rewrite rule: rewrite r_vp { apply(value-pairs("*" rekey(subst(".", "_")))); }; This would take all keys, and replace leading dots in key names with an underscore instead, and it would modify the stuff in-place, so that anything that works with the message from this point onward, would see the rewritten names. In essence, apply() would have a syntax like apply(FUNCTION(...)[, OUTPUT_VARIABLE]) The function could either modify the LogMessage object itself, or return a value. Functions would need to signal which one they do, and the config parser would not allow using a function with an output value without specifying a destination, etc. ...but I guess I'll draft up an RFC about this instead with a few more possible use cases, advantages & disadvantages and the rest. -- |8] ______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
James McDonald <jmcdonald@LCE.com> writes:
I'm confused as to how a geoip function will help my Windows clients created their directories based off their hostnames. All my clients reside in the same network/environment and it's not all clients, just the Windows ones.
It won't. It's a separate thread, on an entirely different topic. -- |8]
----- Original message -----
Hi!
Csaba Major <csaba.major@balabit.com> writes:
I have a system where I have a lot of logs with IP addresses (typically firewall logs), and the requirement was to also include the country name in the logs. I found that a geoip based template function would be an easy way for this, so I created a PoC for that.
This sounds very interesting, thanks for the PoC!
indeed, it is.
The created patch is attached (based on 3.3.5), some may find it useful :) (It requires libgeoip, and a geoip database, of course)
See my comments below. The idea is wonderful, and I'll prep up an improved version for 3.4. Since you want it for 3.3, I'll also create a patch for that branch too, but it's 3.4 where it can be merged, if Bazsi likes the idea too.
--- /dev/null +++ b/modules/tfgeoip/Makefile.am @@ -0,0 +1,11 @@ +moduledir = @moduledir@ +export top_srcdir + +#if ENABLE_UUID +AM_CPPFLAGS = -I$(top_srcdir)/lib -I../../lib $(GEOIP_CFLAGS) +module_LTLIBRARIES = libtfgeoip.la + +libtfgeoip_la_SOURCES = tfgeoip.c +libtfgeoip_la_LIBADD = $(MODULE_DEPS_LIBS) $(GEOIP_LIBS) +libtfgeoip_la_LDFLAGS = $(MODULE_LDFLAGS) +#endif
I assume s/UUID/GEOIP/ :)
diff --git a/modules/tfgeoip/tfgeoip.c b/modules/tfgeoip/tfgeoip.c new file mode 100644 index 0000000..eeea330 --- /dev/null +++ b/modules/tfgeoip/tfgeoip.c [...] +static void +tf_geoip(LogMessage *msg, gint argc, GString *argv[], GString *result) +{ + GeoIP * gi; + const char * returnedCountry; + + if (argc != 1) + return; + gi = GeoIP_open("/usr/share/GeoIP/GeoIP.dat", GEOIP_STANDARD);
probably some caching would make sense here. perhaps even enhancing the dns cache code with the ability to store such information. template functions can be invoked for every message, opening and parsing the geoip database is probably very expensive.
+ + returnedCountry = GeoIP_country_code_by_addr(gi, argv[0]->str); + g_string_append_printf (result, "%s", returnedCountry); +}
As you mentioned in person, it would be nice if the template function could be configured, preferably globally, so that one wouldn't have to repeat the same parameters over and over again in every template where it would be used.
I'm not sure, templates can be declared as objects, referenced from multiple locations in the config. it is true though that this would probably be used in the naming of files which can't be declared in advance. what kind of settings would this be about?
Things like returning country vs country code, the location of the database and so on and so forth should be configurable.
hmmm. isn't the database location fixed? I think when naming files, the country code should be enough. what about using global @define based variables for this purpose? that would probably be simple to access from within the template function and is global. the con of this is that nothing else uses @define yet.
The best would be to have a geoip() option in the global option space, but I'm not sure a plugin can hook into that.
The syslog-ng team added some mechanisms to hook into the global options block though, but last time I looked I wasn't very happy about the implementation. let's talk about this the coming monday.
I suppose the first thing would be to add commandline handling to tf_geoip first, so at least it's configurable. Then figure out a way to make global configuration possible. I have no idea whether this latter is possible with current syslog-ng 3.4, but this is a great opportunity to make it possible to do this.
Unless you want to work on this yourself further, I'll spice it up with error handling and basic configurability in the next few days.
Come to think of it, perhaps it would make sense to introduce a new rewrite command, something along the lines of:
rewrite r_geoip { apply(geoip("SRCIP"), "GEOIP"); };
This would apply the "geoip" function to the value of the "SRCIP" key, and place the result in ${GEOIP}. The advantage of this is that it's perhaps faster to do this than parsing a template function and using g_string_append_printf to reassemble another string.
I'm not sure why this would be faster than the set() rewrite op.
Similarly, this same apply() thing could be used to permanently modify the name-value pairs: similar to what value-pairs() does, but instead of returning a different, entirely independent set of name-value pairs, we could apply the same transformations and filtering within a rewrite rule:
rewrite r_vp { apply(value-pairs("*" rekey(subst(".", "_")))); };
This would take all keys, and replace leading dots in key names with an underscore instead, and it would modify the stuff in-place, so that anything that works with the message from this point onward, would see the rewritten names.
this is a more interesting use-case.
In essence, apply() would have a syntax like apply(FUNCTION(...)[, OUTPUT_VARIABLE])
The function could either modify the LogMessage object itself, or return a value. Functions would need to signal which one they do, and the config parser would not allow using a function with an output value without specifying a destination, etc.
...but I guess I'll draft up an RFC about this instead with a few more possible use cases, advantages & disadvantages and the rest.
good ideas, but I would be careful to introduce another programmable mechanism in this syntax. so the usecases are interesting, I'm not sure about the proposed solutions.
Balazs Scheidler <bazsi77@gmail.com> writes:
diff --git a/modules/tfgeoip/tfgeoip.c b/modules/tfgeoip/tfgeoip.c new file mode 100644 index 0000000..eeea330 --- /dev/null +++ b/modules/tfgeoip/tfgeoip.c [...] +static void +tf_geoip(LogMessage *msg, gint argc, GString *argv[], GString *result) +{ + GeoIP * gi; + const char * returnedCountry; + + if (argc != 1) + return; + gi = GeoIP_open("/usr/share/GeoIP/GeoIP.dat", GEOIP_STANDARD);
probably some caching would make sense here. perhaps even enhancing the dns cache code with the ability to store such information. template functions can be invoked for every message, opening and parsing the geoip database is probably very expensive.
A quick look at the GeoIP library suggest that it can do caching, and it can even keep the file open, and even supports figuring out itself where the file is. So if I turn the template function into one that can keep state, all of this can be taken care of.
+ + returnedCountry = GeoIP_country_code_by_addr(gi, argv[0]->str); + g_string_append_printf (result, "%s", returnedCountry); +}
As you mentioned in person, it would be nice if the template function could be configured, preferably globally, so that one wouldn't have to repeat the same parameters over and over again in every template where it would be used.
I'm not sure, templates can be declared as objects, referenced from multiple locations in the config. it is true though that this would probably be used in the naming of files which can't be declared in advance. what kind of settings would this be about?
Settings could control whether we want the country code (hu) or country name (Hungary), whether the ip is ipv4 or ipv6 (though, I'm not entirely sure this is needed - the geoip api is a bit confusing in this regard).
Things like returning country vs country code, the location of the database and so on and so forth should be configurable.
hmmm. isn't the database location fixed?
There appears to be a separate database for IPv6 addresses, so no, the location is probably not fixed.
I think when naming files, the country code should be enough.
I'd love to store the country name in my database too, now that I know it's possible =)
what about using global @define based variables for this purpose? that would probably be simple to access from within the template function and is global.
the con of this is that nothing else uses @define yet.
Ick. I don't like this idea, I'm afraid.
The best would be to have a geoip() option in the global option space, but I'm not sure a plugin can hook into that.
The syslog-ng team added some mechanisms to hook into the global options block though, but last time I looked I wasn't very happy about the implementation.
That sounds something I just might want to port and pretty-up for OSE.
I suppose the first thing would be to add commandline handling to tf_geoip first, so at least it's configurable. Then figure out a way to make global configuration possible. I have no idea whether this latter is possible with current syslog-ng 3.4, but this is a great opportunity to make it possible to do this.
Unless you want to work on this yourself further, I'll spice it up with error handling and basic configurability in the next few days.
Come to think of it, perhaps it would make sense to introduce a new rewrite command, something along the lines of:
rewrite r_geoip { apply(geoip("SRCIP"), "GEOIP"); };
This would apply the "geoip" function to the value of the "SRCIP" key, and place the result in ${GEOIP}. The advantage of this is that it's perhaps faster to do this than parsing a template function and using g_string_append_printf to reassemble another string.
I'm not sure why this would be faster than the set() rewrite op.
For GeoIP, it probably wouldn't. But it was a trivial example for something that has an output, as opposed to value-pairs() which modifies things in-place.
In essence, apply() would have a syntax like apply(FUNCTION(...)[, OUTPUT_VARIABLE])
The function could either modify the LogMessage object itself, or return a value. Functions would need to signal which one they do, and the config parser would not allow using a function with an output value without specifying a destination, etc.
...but I guess I'll draft up an RFC about this instead with a few more possible use cases, advantages & disadvantages and the rest.
good ideas, but I would be careful to introduce another programmable mechanism in this syntax. so the usecases are interesting, I'm not sure about the proposed solutions.
There's probably a better and/or easier syntax, perhaps it should not be done within the context of rewrite()... I do like the apply(function,...) idea though. I'll prepare a few more use cases for it, and we'll see what can be made out of them. -- |8]
Gergely Nagy <algernon@balabit.hu> writes:
Unless you want to work on this yourself further, I'll spice it up with error handling and basic configurability in the next few days.
...where by days, I obviously meant months. Sorry!
As you mentioned in person, it would be nice if the template function could be configured, preferably globally, so that one wouldn't have to repeat the same parameters over and over again in every template where it would be used.
Things like returning country vs country code, the location of the database and so on and so forth should be configurable.
I ended up not caring about configurability for now. The country vs country code thing is fairly easy, that can be done with some command-line parsing. Using a different database is harder, as right now, I open the database on module load. With a @declare, it's easy to make the database configurable, but only globally: you wouldn't be able to use multiple databases. I'm not sure whether that's a problem or not. Nevertheless, it is something I don't want to deal with right now. The result is currently sitting on my feature/3.4/template-func/geoip branch, but I will likely merge it into merge-queue/3.4 once I added command-line parsing. -- |8]
participants (4)
-
Balazs Scheidler
-
Csaba Major
-
Gergely Nagy
-
James McDonald