[syslog-ng] syslog-ng anon patch

Fri Jun 3 11:56:11 CEST 2005

Hello

>>.... Bad idea not least because the logic of hiding data should be in
>>the frontend and/or the extraction process (ETL) and not in the data
>>storage. On a central syslog server you'd like to have data mining 
>>theories applied for example, where you need the whole set of raw 
>>data, unfiltered. Well, only partially unfiltered, since one will 
>>certainly apply filters in their log statements.
>  
> I very much agree, it would be ideal to handle this problem
> elsewhere--but it would be a lot more work.

I don't know, really. From your webpage I learn that you've also similar 
patches for other system "close" tools. So my first thought was: "is he 
really going to patch each and every tool out there that stores malign 
data"?

> The problem with the
> front end approach is that it would be very difficult to write patches
> for all the many daemons one might run.

See, this is called problem shifting. It is not the responsibility of 
the different tool's authors but the one of the cooporate glueing them 
together into a product they sell.

Example: If you are an ISP and let's say want to provide your customers 
with a simple monitoring framework where they can observe their servers, 
browse certain post-processed log files and generate alerts or pager 
alarms based on configurable triggers. This is a fairly common service 
of an ISP nowadays. From the ISP point of view, you've got all the date 
to provide and help eventual forensics. As the provider of the 
monitoring software you are responsible to strip out the information 
that has legal impact when presented to your customers. As such the 
application running as front-end must have the appropriate means to 
instrument the information. This solves two issues from a business point 
of view:

o You have a certain base USP in that you can sell a product which does
   something more than just display data in a 1:1 mapping
o You, as the business, are responsible to comply to certain acts, laws
   and regulations given by the authoritative force in your geographical
   location. This means, the ISP in our case, is responsible for the data
   integrity and the information handling and disclosure. This takes away
   the responsability from the tool's developers who most of the time are
   not under direct control of the company.

There's more points which have to be considered, but it's far too 
off-topic for this mailinglist. You can contact me privatly regarding 
those points.

> The problem with the
> post-processing and log scrubbing approach is that the data will likely
> sit around for many hours or days.

It's part of the security concept of OSPs/ISPs to maintain an accurate 
enough security policy regarding data handling and disclosure. It's not 
the task of each individual tool to define and adapt corporate 
governement in the field of IT security.

> You are right: this patch hurts log processing. You lose data. It is a

Losing data is one thing, yes, but intended obfuscation is a legal 
matter ;). I know that my statement is maybe a bit too an strong 
argument to have practical consequences.

> trade-off between privacy and analysis. However, an administrator should
> be able to make this choice if they feel that it is more important to
> not retain sensitive data than it is to have a full history of
> everything logged.

The driving force behind those "papers of suggestion or common practice" 
regarding data retention were not administrators but company running a 
business in these fields. As such the administrator is only a part of 
the decision chain in a firm and will certainly have to comply to 
corporate security guidelines, where data protection and disclosure must 
be handled.

>>[snip]... When you work for the state, for banks or insurances, 
>>you'll notice that there the wind is blowing into the other 
>>direction. All, without loss, data is to be stored; and this under 
>>penalty even. At least here in Switzerland. If you lose a message 
>>while a potential "break-in" has occured or can be correlated it 
>>might cost you your head :).
> 
> A delicate matter indeed! It is my understanding that there are legal
> problems with such modification of logs in France, the UK, and maybe
> Switzerland(?).

I would assume so, but I'd need to ask a lawyer.

> I defer to the lawyers. The EFF seems to think that this 'dilution' is
> (a) legal in the U.S. and (b) advisable.

 From the information point of view this makes sense, from an business 
model point of view this is a drawback.

> (http://eff.org is the major
> civil liberties internet watchdog in the US).

... with far to little money to have important influences on the IT 
market in the US I believe ...

> Method 1 and 2 are great,
> but most of the time there is still very useful information in logs even
> after extensive stripping. For example, suppose a log file of login
> attempts: username, ip, and if the attempt was successful. Even if you
> removed username and ip, it is very useful to know if there is a spike
> in failed login attempts, for example.

Absolutely, but what are you going to write in your executive summary? 
Last month we observed a unusual spike regarding failed login attempts 
to our foobar server (used for financial transaction) on week 19, 
between Friday and Saturday night. Due to data retention reasons (EFF) 
we do not have any IPs logged. We are thus not certain if this 
constitutes an act of crime (a hacker attempt) or if our application's 
unit test conducts which also need to connect to this live database 
container have gone wild.

> ok. It was included for historical reasons (a previous patch only did
> 'strip').

Excellent. Redo you patch and I'd say this has a good chance of 
inclusion because it does have a valid use case, at least in the US and 
for people that see data retention from the adminstrators point of view.

> I agree, it is incomplete and should not be included.

You have an excellent documentation online anyway. Debian folks will 
probably take your sample file :).

>>remove, also because not all IPs are logged in dotted decimals for
>>example.
> 
> Do you mean that it should also support IPv6? I am happy to include this
> in an update to the patch.

Excellent.

> It can get complex. Here is an example IPv6 regexp:
> http://blogs.msdn.com/mpoulson/archive/2005/01/10/350037.aspx
> 
>>Const strIPv6Pattern as string =
>>"\A(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\z"
>>Const strIPv6Pattern_HEXCompressed as string =
>>"\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)\z"
>>Const StrIPv6Pattern_6Hex4Dec as string =
>>"\A((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
>>Const StrIPv6Pattern_Hex4DecCompressed as string =
>>"\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)
>>::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"

To be honest I cannot verify the correctness of those regexp, partly due 
to the unwillingness to spend the necessary time and partly due to the 
fact that I'm not that proficient with regexp.

> The tricky part is that you can mix decimal IPv4 with hex IPv6, and
> leave out multiple blocks of 0's, but not more than once. Anyone have a
> more elegant expression?

Thank you for your valuable comments. Best regards,
Roberto Nibali, ratz
-- 
echo 
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc