[syslog-ng] syslog-ng anon patch
Roberto Nibali
ratz at drugphish.ch
Fri Jun 3 11:56:11 CEST 2005
Hello
>>.... Bad idea not least because the logic of hiding data should be in
>>the frontend and/or the extraction process (ETL) and not in the data
>>storage. On a central syslog server you'd like to have data mining
>>theories applied for example, where you need the whole set of raw
>>data, unfiltered. Well, only partially unfiltered, since one will
>>certainly apply filters in their log statements.
>
> I very much agree, it would be ideal to handle this problem
> elsewhere--but it would be a lot more work.
I don't know, really. From your webpage I learn that you've also similar
patches for other system "close" tools. So my first thought was: "is he
really going to patch each and every tool out there that stores malign
data"?
> The problem with the
> front end approach is that it would be very difficult to write patches
> for all the many daemons one might run.
See, this is called problem shifting. It is not the responsibility of
the different tool's authors but the one of the cooporate glueing them
together into a product they sell.
Example: If you are an ISP and let's say want to provide your customers
with a simple monitoring framework where they can observe their servers,
browse certain post-processed log files and generate alerts or pager
alarms based on configurable triggers. This is a fairly common service
of an ISP nowadays. From the ISP point of view, you've got all the date
to provide and help eventual forensics. As the provider of the
monitoring software you are responsible to strip out the information
that has legal impact when presented to your customers. As such the
application running as front-end must have the appropriate means to
instrument the information. This solves two issues from a business point
of view:
o You have a certain base USP in that you can sell a product which does
something more than just display data in a 1:1 mapping
o You, as the business, are responsible to comply to certain acts, laws
and regulations given by the authoritative force in your geographical
location. This means, the ISP in our case, is responsible for the data
integrity and the information handling and disclosure. This takes away
the responsability from the tool's developers who most of the time are
not under direct control of the company.
There's more points which have to be considered, but it's far too
off-topic for this mailinglist. You can contact me privatly regarding
those points.
> The problem with the
> post-processing and log scrubbing approach is that the data will likely
> sit around for many hours or days.
It's part of the security concept of OSPs/ISPs to maintain an accurate
enough security policy regarding data handling and disclosure. It's not
the task of each individual tool to define and adapt corporate
governement in the field of IT security.
> You are right: this patch hurts log processing. You lose data. It is a
Losing data is one thing, yes, but intended obfuscation is a legal
matter ;). I know that my statement is maybe a bit too an strong
argument to have practical consequences.
> trade-off between privacy and analysis. However, an administrator should
> be able to make this choice if they feel that it is more important to
> not retain sensitive data than it is to have a full history of
> everything logged.
The driving force behind those "papers of suggestion or common practice"
regarding data retention were not administrators but company running a
business in these fields. As such the administrator is only a part of
the decision chain in a firm and will certainly have to comply to
corporate security guidelines, where data protection and disclosure must
be handled.
>>[snip]... When you work for the state, for banks or insurances,
>>you'll notice that there the wind is blowing into the other
>>direction. All, without loss, data is to be stored; and this under
>>penalty even. At least here in Switzerland. If you lose a message
>>while a potential "break-in" has occured or can be correlated it
>>might cost you your head :).
>
> A delicate matter indeed! It is my understanding that there are legal
> problems with such modification of logs in France, the UK, and maybe
> Switzerland(?).
I would assume so, but I'd need to ask a lawyer.
> I defer to the lawyers. The EFF seems to think that this 'dilution' is
> (a) legal in the U.S. and (b) advisable.
From the information point of view this makes sense, from an business
model point of view this is a drawback.
> (http://eff.org is the major
> civil liberties internet watchdog in the US).
... with far to little money to have important influences on the IT
market in the US I believe ...
> Method 1 and 2 are great,
> but most of the time there is still very useful information in logs even
> after extensive stripping. For example, suppose a log file of login
> attempts: username, ip, and if the attempt was successful. Even if you
> removed username and ip, it is very useful to know if there is a spike
> in failed login attempts, for example.
Absolutely, but what are you going to write in your executive summary?
Last month we observed a unusual spike regarding failed login attempts
to our foobar server (used for financial transaction) on week 19,
between Friday and Saturday night. Due to data retention reasons (EFF)
we do not have any IPs logged. We are thus not certain if this
constitutes an act of crime (a hacker attempt) or if our application's
unit test conducts which also need to connect to this live database
container have gone wild.
> ok. It was included for historical reasons (a previous patch only did
> 'strip').
Excellent. Redo you patch and I'd say this has a good chance of
inclusion because it does have a valid use case, at least in the US and
for people that see data retention from the adminstrators point of view.
> I agree, it is incomplete and should not be included.
You have an excellent documentation online anyway. Debian folks will
probably take your sample file :).
>>remove, also because not all IPs are logged in dotted decimals for
>>example.
>
> Do you mean that it should also support IPv6? I am happy to include this
> in an update to the patch.
Excellent.
> It can get complex. Here is an example IPv6 regexp:
> http://blogs.msdn.com/mpoulson/archive/2005/01/10/350037.aspx
>
>>Const strIPv6Pattern as string =
>>"\A(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\z"
>>Const strIPv6Pattern_HEXCompressed as string =
>>"\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)\z"
>>Const StrIPv6Pattern_6Hex4Dec as string =
>>"\A((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
>>Const StrIPv6Pattern_Hex4DecCompressed as string =
>>"\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)
>>::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
To be honest I cannot verify the correctness of those regexp, partly due
to the unwillingness to spend the necessary time and partly due to the
fact that I'm not that proficient with regexp.
> The tricky part is that you can mix decimal IPv4 with hex IPv6, and
> leave out multiple blocks of 0's, but not more than once. Anyone have a
> more elegant expression?
Thank you for your valuable comments. Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
More information about the syslog-ng
mailing list