Hello
.... Bad idea not least because the logic of hiding data should be in the frontend and/or the extraction process (ETL) and not in the data storage. On a central syslog server you'd like to have data mining theories applied for example, where you need the whole set of raw data, unfiltered. Well, only partially unfiltered, since one will certainly apply filters in their log statements.
I very much agree, it would be ideal to handle this problem elsewhere--but it would be a lot more work.
I don't know, really. From your webpage I learn that you've also similar patches for other system "close" tools. So my first thought was: "is he really going to patch each and every tool out there that stores malign data"?
The problem with the front end approach is that it would be very difficult to write patches for all the many daemons one might run.
See, this is called problem shifting. It is not the responsibility of the different tool's authors but the one of the cooporate glueing them together into a product they sell. Example: If you are an ISP and let's say want to provide your customers with a simple monitoring framework where they can observe their servers, browse certain post-processed log files and generate alerts or pager alarms based on configurable triggers. This is a fairly common service of an ISP nowadays. From the ISP point of view, you've got all the date to provide and help eventual forensics. As the provider of the monitoring software you are responsible to strip out the information that has legal impact when presented to your customers. As such the application running as front-end must have the appropriate means to instrument the information. This solves two issues from a business point of view: o You have a certain base USP in that you can sell a product which does something more than just display data in a 1:1 mapping o You, as the business, are responsible to comply to certain acts, laws and regulations given by the authoritative force in your geographical location. This means, the ISP in our case, is responsible for the data integrity and the information handling and disclosure. This takes away the responsability from the tool's developers who most of the time are not under direct control of the company. There's more points which have to be considered, but it's far too off-topic for this mailinglist. You can contact me privatly regarding those points.
The problem with the post-processing and log scrubbing approach is that the data will likely sit around for many hours or days.
It's part of the security concept of OSPs/ISPs to maintain an accurate enough security policy regarding data handling and disclosure. It's not the task of each individual tool to define and adapt corporate governement in the field of IT security.
You are right: this patch hurts log processing. You lose data. It is a
Losing data is one thing, yes, but intended obfuscation is a legal matter ;). I know that my statement is maybe a bit too an strong argument to have practical consequences.
trade-off between privacy and analysis. However, an administrator should be able to make this choice if they feel that it is more important to not retain sensitive data than it is to have a full history of everything logged.
The driving force behind those "papers of suggestion or common practice" regarding data retention were not administrators but company running a business in these fields. As such the administrator is only a part of the decision chain in a firm and will certainly have to comply to corporate security guidelines, where data protection and disclosure must be handled.
[snip]... When you work for the state, for banks or insurances, you'll notice that there the wind is blowing into the other direction. All, without loss, data is to be stored; and this under penalty even. At least here in Switzerland. If you lose a message while a potential "break-in" has occured or can be correlated it might cost you your head :).
A delicate matter indeed! It is my understanding that there are legal problems with such modification of logs in France, the UK, and maybe Switzerland(?).
I would assume so, but I'd need to ask a lawyer.
I defer to the lawyers. The EFF seems to think that this 'dilution' is (a) legal in the U.S. and (b) advisable.
From the information point of view this makes sense, from an business model point of view this is a drawback.
(http://eff.org is the major civil liberties internet watchdog in the US).
... with far to little money to have important influences on the IT market in the US I believe ...
Method 1 and 2 are great, but most of the time there is still very useful information in logs even after extensive stripping. For example, suppose a log file of login attempts: username, ip, and if the attempt was successful. Even if you removed username and ip, it is very useful to know if there is a spike in failed login attempts, for example.
Absolutely, but what are you going to write in your executive summary? Last month we observed a unusual spike regarding failed login attempts to our foobar server (used for financial transaction) on week 19, between Friday and Saturday night. Due to data retention reasons (EFF) we do not have any IPs logged. We are thus not certain if this constitutes an act of crime (a hacker attempt) or if our application's unit test conducts which also need to connect to this live database container have gone wild.
ok. It was included for historical reasons (a previous patch only did 'strip').
Excellent. Redo you patch and I'd say this has a good chance of inclusion because it does have a valid use case, at least in the US and for people that see data retention from the adminstrators point of view.
I agree, it is incomplete and should not be included.
You have an excellent documentation online anyway. Debian folks will probably take your sample file :).
remove, also because not all IPs are logged in dotted decimals for example.
Do you mean that it should also support IPv6? I am happy to include this in an update to the patch.
Excellent.
It can get complex. Here is an example IPv6 regexp: http://blogs.msdn.com/mpoulson/archive/2005/01/10/350037.aspx
Const strIPv6Pattern as string = "\A(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\z" Const strIPv6Pattern_HEXCompressed as string = "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)\z" Const StrIPv6Pattern_6Hex4Dec as string = "\A((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z" Const StrIPv6Pattern_Hex4DecCompressed as string = "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
To be honest I cannot verify the correctness of those regexp, partly due to the unwillingness to spend the necessary time and partly due to the fact that I'm not that proficient with regexp.
The tricky part is that you can mix decimal IPv4 with hex IPv6, and leave out multiple blocks of 0's, but not more than once. Anyone have a more elegant expression?
Thank you for your valuable comments. Best regards, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc