Caveat: I have not done this (this is just a thought) Could you have syslog-ng send the appropriate logs to a program destination, and use the program to anonymize the data? For example, a Perl script could store the real values in a database / associative array, replacing them with randomized values. Then let the Perl script either write to a [file|pipe|socket] that syslog-ng would listen to and handle as if it were a real log. <log source> --> <syslog-ng> --> <perl script> --> syslog-ng --> <final destination> Using berkeley db, the Perl script could preserve the mapping in a file, so if you needed to it would be a simple function to de-anonymize the logfile. Sounds a lot worse than it would be I imagine. (but then everything is easy for he who does not have to do it :-) Later, Jim On Wed, 2011-12-21 at 14:17 +0100, Balazs Scheidler wrote:
On Wed, 2011-12-21 at 14:31 +0530, Anup Shetty wrote:
I am new to syslog-ng and would like some help on the pattern matching and the substitution option. Currently the requirement is to substitute a parameter in the message with a random value in order to anonymize it.
For example:
Dec 31 23:13:25 servername sshd[25218]: Failed keyboard-interactive/pam for user1 from 10.x.x.x port 47325 ssh2
If I create a pattern database for this message and pick out the username using the string and substitute it user1 to say anon1, will I be able to store the original-substituted value pair for this user and use it repeatedly? Would I be able to do it for all the subsequent logs?
To be more clear, an example substitution process that must happen as the logs arrive and the patterns are matched. log with user1 arrives and is substituted by anon1 log with user2 arrives and is substituted by anon2 again log with user1 arrives and is again substituted by anon1 log with user3 arrives and is substituted by anon3 again log with user2 arrives and is again substituted by anon2 . . . . This is required so that once the usernames are substituted for attaining anonymity, there must be a way to reverse them for audit purposes.
you want to do that on-the-fly or during postprocessing?
Right now it is not possible to do with patterndb only as it only extracts information from messages and never changes them, but anonimization has always been a hidden agenda of patterndb, which never materialized.