[PATCH] anonymizing filter
Hello, A couple years ago this patch was submitted to the list for consideration for inclusion into syslog-ng. I am writing this email again to request that it be considered again. The patch provides a simple replace which enables you to strip out IP addresses from your logs before they are written to disk. The patch has been included in the Debian stable distribution, and currently is included in both Debian Sid and Lenny (unstable and testing). It has had a very wide testing base and is non-intrusive, it has existed since 2004 and has been adapted to work with the newer syslog-ng. The goal of this patch is to give an organization the means to implement site logging policies, by allowing for easy control over exactly what data is retained in the logfiles. When I first requested consideration for inclusion the reactions were some suggestions for improvement (which were done), some side discussions about the various states of data retention laws, and a general agreement that this patch is non-intrusive and had a valid use case (at least in the U.S., but also likely in other countries as well[0]). The side-discussions about data-retention laws were mostly around specific geographic localities that were considering laws that would make stripping of addresses illegal, or had already mandated such things. Although these were interesting discussions, as EU data retention laws would prohibit many people from making such configuration changes to their syslog-ng.conf, they were tangential to the point because this patch does not cause those to break such laws. On the other side of the pond, in the U.S., the EFF[1] has made it very clear that this mechanism of anonymizing logs is perfectly (a) legal in the U.S., and (b) advisable. There are many instances where it is preferable to keep less information on users than is collected by default on many systems. In the United States it is not currently required to retain data on users of a server, but you may be required to provide all data on a user which you have retained. OSPs can protect themselves from legal hassles and added work by choosing what data they wish to retain. The current climate in the U.S. makes this problem so much more important now than it was many years ago. Having the ability to implement a site-policy that enables an organization to decide if the trade-off between privacy and analysis is worthwhile. This patch allows organizations to have that choice if they feel that it is more important to avoid retaining sensitive data rather than having a full history of everything logged. Please accept this patch[2], Micah [0] EPIC International Data Retention Page http://www.epic.org/privacy/intl/data_retention.html [1] The EFF is the major civil liberties internet watchdog in the US, their "Best Practices for Online Service Providers" can be found here: http://www.eff.org/osp, they explicitly link to our patch as a recommendation [2] The latest patch can be found at https://code.autistici.org/trac/privacy/browser/trunk/syslog-ng
Micah Anderson wrote:
Hello,
A couple years ago this patch was submitted to the list for consideration for inclusion into syslog-ng. I am writing this email again to request that it be considered again. The patch provides a simple replace which enables you to strip out IP addresses from your logs before they are written to disk. The patch has been included in the Debian stable distribution, and currently is included in both Debian Sid and Lenny (unstable and testing). It has had a very wide testing base and is non-intrusive, it has existed since 2004 and has been adapted to work with the newer syslog-ng. The goal of this patch is to give an organization the means to implement site logging policies, by allowing for easy control over exactly what data is retained in the logfiles.
When I first requested consideration for inclusion the reactions were some suggestions for improvement (which were done), some side discussions about the various states of data retention laws, and a general agreement that this patch is non-intrusive and had a valid use case (at least in the U.S., but also likely in other countries as well[0]).
I don't want to imply that this patch is in any way undesirable. On the contrary I think that it is very useful, however, the same result can be obtained by the general message rewrite facility that has already been proposed. I would rather have the authors work on the general message rewrite engine so that we can have a code base that meets more needs, rather than specific needs. Perhaps your patch is a good example of how to implement message rewriting and could be a starting point for the author (I have not looked at any of the code, so I can't comment on this aspect). Just my $0.02 -- Evan Rempel
* Evan Rempel <erempel@uvic.ca> [071130 11:45]:
I don't want to imply that this patch is in any way undesirable. On the contrary I think that it is very useful, however, the same result can be obtained by the general message rewrite facility that has already been proposed. I would rather have the authors work on the general message rewrite engine so that we can have a code base that meets more needs, rather than specific needs.
I agree that there may be better ways to do this in the future. However, as it stands now (and has been since 2003, when we first wrote this patch), this is the only way to do it now. I too would like to see a general message rewrite engine, but I dont think adding this simple patch to the existing system would really take that much away from that work. It makes sense to me to include this yesterday and then when that facility is available we can finally retire this patch which has been tracking development for years. Speaking of which, I noticed that 2.0.6 hit the website, so I've adapted the patch to fix a few offsets that came about with the new version. Its attached. Micah
Hi, As someone who operates systems where privacy is desired by their users, I have found this patch very useful. Infact, I found it so useful, that I did the initial port of this patch to syslog-ng 2. I was told things when I submitted it like "well, all of those apps you use should strip the data instead". It is very inconvenient (and if you use commercial software, impossible) to patch a bunch of daemons (the average server can have 30 or more daemons running!) when instead you can strip the information out in the log instead. Other people told me things like "well, why do they need privacy? clearly they are doing something _wrong_ if they need privacy," and well, that's not the case either. Besides the rationale that Micah mentioned for this patch, consider the case where a system gets compromised by spammers (ok, really, this shouldn't happen, but in reality, it does - usually due to upstream vendors not getting patches out in time), the syslogs commonly contain e-mail traffic information, which may not be desirable in the hands of spammers. Having the option to implement a policy which avoids retaining data would also have the benefit of avoiding a situation like the one I describe. At a minimum, I would suggest providing a pointer to this patch. Also, on another note, Debian has included this patch for some time, which means that it's theoretically proven to be reliable. William On Fri, 2007-11-30 at 14:03 -0500, Micah Anderson wrote:
Hello,
A couple years ago this patch was submitted to the list for consideration for inclusion into syslog-ng. I am writing this email again to request that it be considered again. The patch provides a simple replace which enables you to strip out IP addresses from your logs before they are written to disk. The patch has been included in the Debian stable distribution, and currently is included in both Debian Sid and Lenny (unstable and testing). It has had a very wide testing base and is non-intrusive, it has existed since 2004 and has been adapted to work with the newer syslog-ng. The goal of this patch is to give an organization the means to implement site logging policies, by allowing for easy control over exactly what data is retained in the logfiles.
When I first requested consideration for inclusion the reactions were some suggestions for improvement (which were done), some side discussions about the various states of data retention laws, and a general agreement that this patch is non-intrusive and had a valid use case (at least in the U.S., but also likely in other countries as well[0]).
The side-discussions about data-retention laws were mostly around specific geographic localities that were considering laws that would make stripping of addresses illegal, or had already mandated such things. Although these were interesting discussions, as EU data retention laws would prohibit many people from making such configuration changes to their syslog-ng.conf, they were tangential to the point because this patch does not cause those to break such laws.
On the other side of the pond, in the U.S., the EFF[1] has made it very clear that this mechanism of anonymizing logs is perfectly (a) legal in the U.S., and (b) advisable. There are many instances where it is preferable to keep less information on users than is collected by default on many systems. In the United States it is not currently required to retain data on users of a server, but you may be required to provide all data on a user which you have retained. OSPs can protect themselves from legal hassles and added work by choosing what data they wish to retain. The current climate in the U.S. makes this problem so much more important now than it was many years ago.
Having the ability to implement a site-policy that enables an organization to decide if the trade-off between privacy and analysis is worthwhile. This patch allows organizations to have that choice if they feel that it is more important to avoid retaining sensitive data rather than having a full history of everything logged.
Please accept this patch[2], Micah
[0] EPIC International Data Retention Page http://www.epic.org/privacy/intl/data_retention.html
[1] The EFF is the major civil liberties internet watchdog in the US, their "Best Practices for Online Service Providers" can be found here: http://www.eff.org/osp, they explicitly link to our patch as a recommendation
[2] The latest patch can be found at https://code.autistici.org/trac/privacy/browser/trunk/syslog-ng
_______________________________________________ syslog-ng maillist - syslog-ng@lists.balabit.hu https://lists.balabit.hu/mailman/listinfo/syslog-ng Frequently asked questions at http://www.campin.net/syslog-ng/faq.html
On Fri, 30 Nov 2007 14:04:52 -0600 William Pitcock <nenolod@sacredspiral.co.uk> wrote:
Hi,
As someone who operates systems where privacy is desired by their users, I have found this patch very useful. Infact, I found it so useful, that I did the initial port of this patch to syslog-ng 2.
I was told things when I submitted it like "well, all of those apps you use should strip the data instead". It is very inconvenient (and if you use commercial software, impossible) to patch a bunch of daemons (the average server can have 30 or more daemons running!) when instead you can strip the information out in the log instead.
Hi, I would also highly welcome the inclusion of this patch, since it provides functionality that is required for legal reasons. Existing privacy laws in Germany (and, I think, in other EU states as well) do not allow servive providers to log data that are not required for providing their service. There has been a recent lawsuit in Germany where the court has found that customary logging of IP adresses is illegal (i.e. logging may only be enabled on a case-by-case basis, e.g. during a DDoS attack). It is very difficult right now to run a Linux (or Unix) system while complying with the law. Basically you would need to jump through loops and run scripts to anonymize data that should never have hit the disk in non-anonymized form. Thus I would be glad if it were possible to strip IP adresses in syslog. rainer
On Fri, 2007-11-30 at 14:03 -0500, Micah Anderson wrote:
Hello,
A couple years ago this patch was submitted to the list for consideration for inclusion into syslog-ng. I am writing this email again to request that it be considered again. The patch provides a simple replace which enables you to strip out IP addresses from your logs before they are written to disk. The patch has been included in the Debian stable distribution, and currently is included in both Debian Sid and Lenny (unstable and testing). It has had a very wide testing base and is non-intrusive, it has existed since 2004 and has been adapted to work with the newer syslog-ng. The goal of this patch is to give an organization the means to implement site logging policies, by allowing for easy control over exactly what data is retained in the logfiles.
When I first requested consideration for inclusion the reactions were some suggestions for improvement (which were done), some side discussions about the various states of data retention laws, and a general agreement that this patch is non-intrusive and had a valid use case (at least in the U.S., but also likely in other countries as well[0]).
The side-discussions about data-retention laws were mostly around specific geographic localities that were considering laws that would make stripping of addresses illegal, or had already mandated such things. Although these were interesting discussions, as EU data retention laws would prohibit many people from making such configuration changes to their syslog-ng.conf, they were tangential to the point because this patch does not cause those to break such laws.
On the other side of the pond, in the U.S., the EFF[1] has made it very clear that this mechanism of anonymizing logs is perfectly (a) legal in the U.S., and (b) advisable. There are many instances where it is preferable to keep less information on users than is collected by default on many systems. In the United States it is not currently required to retain data on users of a server, but you may be required to provide all data on a user which you have retained. OSPs can protect themselves from legal hassles and added work by choosing what data they wish to retain. The current climate in the U.S. makes this problem so much more important now than it was many years ago.
Having the ability to implement a site-policy that enables an organization to decide if the trade-off between privacy and analysis is worthwhile. This patch allows organizations to have that choice if they feel that it is more important to avoid retaining sensitive data rather than having a full history of everything logged.
I understand that the need is genuine and the feature this patch provides is useful, but exactly as Evan wrote in his email, its implementation is way out of the syslog-ng model. It uses filters to rewrite parts of the message. -- Bazsi
* Balazs Scheidler <bazsi@balabit.hu> [071203 02:50]:
I understand that the need is genuine and the feature this patch provides is useful, but exactly as Evan wrote in his email, its implementation is way out of the syslog-ng model. It uses filters to rewrite parts of the message.
I agree that there may be better ways to do this, however it is my understanding that the 'general message rewrite facility' is not currently anything other than a proposal. When is this expected to be available so we can do this the 'right' way? If it is far off, could this patch be accepted until that is available and it can be replaced with the better way? Micah
On Mon, 2007-12-03 at 15:17 -0500, Micah Anderson wrote:
* Balazs Scheidler <bazsi@balabit.hu> [071203 02:50]:
I understand that the need is genuine and the feature this patch provides is useful, but exactly as Evan wrote in his email, its implementation is way out of the syslog-ng model. It uses filters to rewrite parts of the message.
I agree that there may be better ways to do this, however it is my understanding that the 'general message rewrite facility' is not currently anything other than a proposal. When is this expected to be available so we can do this the 'right' way? If it is far off, could this patch be accepted until that is available and it can be replaced with the better way?
True, but removing a feature later/making an incompatible change in the configuration file is always a pain, and I'll be the one to explain it to other users who did not follow this discussion. I'm doing the design of the next major syslog-ng release right now, and I'll make sure this feature sticks somewhere. -- Bazsi
On Tue, 2007-12-04 at 09:38 +0100, Balazs Scheidler wrote:
On Mon, 2007-12-03 at 15:17 -0500, Micah Anderson wrote:
* Balazs Scheidler <bazsi@balabit.hu> [071203 02:50]:
I understand that the need is genuine and the feature this patch provides is useful, but exactly as Evan wrote in his email, its implementation is way out of the syslog-ng model. It uses filters to rewrite parts of the message.
I agree that there may be better ways to do this, however it is my understanding that the 'general message rewrite facility' is not currently anything other than a proposal. When is this expected to be available so we can do this the 'right' way? If it is far off, could this patch be accepted until that is available and it can be replaced with the better way?
True, but removing a feature later/making an incompatible change in the configuration file is always a pain, and I'll be the one to explain it to other users who did not follow this discussion.
I'm doing the design of the next major syslog-ng release right now, and I'll make sure this feature sticks somewhere.
Just to let you know the current syslog-ng 3.0 git tree has functionality that should address this issue. Please let me know if I missed something during development. Here's the documentation: http://www.balabit.com/dl/html/syslog-ng-v3.0-guide-admin-en.html/ch03s09.ht... -- Bazsi
* Micah Anderson <micah@riseup.net> [2007-11-30 20:04]: [...]
+ if (!g_ascii_strcasecmp(re, "ips")) + re = "(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])([\\.\\-](25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}";
Urgh, that's IPv4 only. Boo!! :-P -- Regards, Wolfram Schlich <wschlich@gentoo.org> Gentoo Linux * http://dev.gentoo.org/~wschlich/
* Wolfram Schlich <lists@wolfram.schlich.org> [071204 06:05]:
* Micah Anderson <micah@riseup.net> [2007-11-30 20:04]: [...]
+ if (!g_ascii_strcasecmp(re, "ips")) + re = "(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])([\\.\\-](25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}";
Urgh, that's IPv4 only. Boo!! :-P
Same boo happened last time around and nobody could come up with a more elegant one than proposed: It can get complex. Here is an example IPv6 regexp: (http://blogs.msdn.com/mpoulson/archive/2005/01/10/350037.aspx)
Const strIPv6Pattern as string = "\A(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\z" Const strIPv6Pattern_HEXCompressed as string = "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)\z" Const StrIPv6Pattern_6Hex4Dec as string = "\A((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z" Const StrIPv6Pattern_Hex4DecCompressed as string = "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
The tricky part is that you can mix decimal IPv4 with hex IPv6, and leave out multiple blocks of 0's, but not more than once. Anyone have a more elegant expression? micah
* Micah Anderson <micah@riseup.net> [2007-12-04 16:41]:
* Wolfram Schlich <lists@wolfram.schlich.org> [071204 06:05]:
* Micah Anderson <micah@riseup.net> [2007-11-30 20:04]: [...]
+ if (!g_ascii_strcasecmp(re, "ips")) + re = "(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])([\\.\\-](25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}";
Urgh, that's IPv4 only. Boo!! :-P
Same boo happened last time around and nobody could come up with a more elegant one than proposed:
:P
It can get complex. Here is an example IPv6 regexp: (http://blogs.msdn.com/mpoulson/archive/2005/01/10/350037.aspx)
Const strIPv6Pattern as string = "\A(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\z" Const strIPv6Pattern_HEXCompressed as string = "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)\z" Const StrIPv6Pattern_6Hex4Dec as string = "\A((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z" Const StrIPv6Pattern_Hex4DecCompressed as string = "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
The tricky part is that you can mix decimal IPv4 with hex IPv6,
How can that be?! Can you show me an example, please?
and leave out multiple blocks of 0's, but not more than once.
Yeah. Nothing fancy :)
Anyone have a more elegant expression?
As there are dozens ov IPv6 capable programs out there that are able to parse IPv6 addresses from e.g. config files, it should be relatively easy to get some impressions on how it could be done. But anyway, I'm not really interested in your patch due to not being a "generic message rewriting facility" currently, so don't spend time on that IPv6 thingy just because I yelled :o) -- Regards, Wolfram Schlich <wschlich@gentoo.org> Gentoo Linux * http://dev.gentoo.org/~wschlich/
participants (6)
-
Balazs Scheidler
-
Evan Rempel
-
Micah Anderson
-
Rainer Wichmann
-
William Pitcock
-
Wolfram Schlich