[syslog-ng] syslog-ng anon patch

elijah at riseup.net elijah at riseup.net
Fri Jun 3 04:07:07 CEST 2005


Roberto Nibali wrote:

>> The attached patch comes from http://dev.riseup.net/patches/syslog-ng

> Gives you a 404 at first until you click on login.

Sorry, this was temporarily misdirected.

>> what it does is provide a simple filter to strip out unwanted 
>> regular expressions from logs...

> .... Bad idea not least because the logic of hiding data should be in
> the frontend and/or the extraction process (ETL) and not in the data
> storage. On a central syslog server you'd like to have data mining 
> theories applied for example, where you need the whole set of raw 
> data, unfiltered. Well, only partially unfiltered, since one will 
> certainly apply filters in their log statements.

I very much agree, it would be ideal to handle this problem
elsewhere--but it would be a lot more work. The problem with the
front end approach is that it would be very difficult to write patches
for all the many daemons one might run. The problem with the
post-processing and log scrubbing approach is that the data will likely
sit around for many hours or days.

You are right: this patch hurts log processing. You lose data. It is a
trade-off between privacy and analysis. However, an administrator should
be able to make this choice if they feel that it is more important to
not retain sensitive data than it is to have a full history of
everything logged.

> Method 1: have log statements which omit certain log lines, and don't
>  set a catchall log statement
> 
> Method 2: build a filter for lines you'd like to match and forget. 
> Add a destination statement with /dev/null as file destination.
> 
> Method 3: strip the lines.
> 
> Method 1 and 2 drop information, but basically maintain their value 
> of truth. Method 3 changes the information gain and thus, strongly 
> speaking, dilutes the truth. Dealing with the legal aspects of 
> information gain/loss with regard to dilution is a delicate matter.

> [snip]... When you work for the state, for banks or insurances, 
> you'll notice that there the wind is blowing into the other 
> direction. All, without loss, data is to be stored; and this under 
> penalty even. At least here in Switzerland. If you lose a message 
> while a potential "break-in" has occured or can be correlated it 
> might cost you your head :).

A delicate matter indeed! It is my understanding that there are legal
problems with such modification of logs in France, the UK, and maybe
Switzerland(?).

I defer to the lawyers. The EFF seems to think that this 'dilution' is
(a) legal in the U.S. and (b) advisable. (http://eff.org is the major
civil liberties internet watchdog in the US). Method 1 and 2 are great,
but most of the time there is still very useful information in logs even
after extensive stripping. For example, suppose a log file of login
attempts: username, ip, and if the attempt was successful. Even if you
removed username and ip, it is very useful to know if there is a spike
in failed login attempts, for example.

> I don't see the necessity to provide a keyword strip as a subset of 
> replace. Please drop it.

ok. It was included for historical reasons (a previous patch only did
'strip').

> I don't think this sample file is needed.

I agree, it is incomplete and should not be included.

>> + if (strcasecmp(re,"ips") == 0) {
>> +    re =
"(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])([\\.\\-](25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9])){3}";
>> + }

> remove, also because not all IPs are logged in dotted decimals for
> example.

Do you mean that it should also support IPv6? I am happy to include this
in an update to the patch.

It can get complex. Here is an example IPv6 regexp:
http://blogs.msdn.com/mpoulson/archive/2005/01/10/350037.aspx
> Const strIPv6Pattern as string =
> "\A(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\z"
> Const strIPv6Pattern_HEXCompressed as string =
> "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)\z"
> Const StrIPv6Pattern_6Hex4Dec as string =
> "\A((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"
> Const StrIPv6Pattern_Hex4DecCompressed as string =
> "\A((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)
> ::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\z"

The tricky part is that you can mix decimal IPv4 with hex IPv6, and
leave out multiple blocks of 0's, but not more than once. Anyone have a
more elegant expression?

-elijah


More information about the syslog-ng mailing list