Message parameter substitution
I am new to syslog-ng and would like some help on the pattern matching and the substitution option. Currently the requirement is to substitute a parameter in the message with a random value in order to anonymize it. *For example:* Dec 31 23:13:25 servername sshd[25218]: Failed keyboard-interactive/pam for *user1* from 10.x.x.x port 47325 ssh2 If I create a pattern database for this message and pick out the username using the string and substitute it user1 to say anon1, will I be able to store the original-substituted value pair for this user and use it repeatedly? Would I be able to do it for all the subsequent logs? To be more clear, an example substitution process that must happen as the logs arrive and the patterns are matched. log with user1 arrives and is substituted by anon1 log with user2 arrives and is substituted by anon2 again log with user1 arrives and is again substituted by anon1 log with user3 arrives and is substituted by anon3 again log with user2 arrives and is again substituted by anon2 . . . . This is required so that once the usernames are substituted for attaining anonymity, there must be a way to reverse them for audit purposes. -- Thanks and regards, AS
On Wed, 2011-12-21 at 14:31 +0530, Anup Shetty wrote:
I am new to syslog-ng and would like some help on the pattern matching and the substitution option. Currently the requirement is to substitute a parameter in the message with a random value in order to anonymize it.
For example:
Dec 31 23:13:25 servername sshd[25218]: Failed keyboard-interactive/pam for user1 from 10.x.x.x port 47325 ssh2
If I create a pattern database for this message and pick out the username using the string and substitute it user1 to say anon1, will I be able to store the original-substituted value pair for this user and use it repeatedly? Would I be able to do it for all the subsequent logs?
To be more clear, an example substitution process that must happen as the logs arrive and the patterns are matched. log with user1 arrives and is substituted by anon1 log with user2 arrives and is substituted by anon2 again log with user1 arrives and is again substituted by anon1 log with user3 arrives and is substituted by anon3 again log with user2 arrives and is again substituted by anon2 . . . . This is required so that once the usernames are substituted for attaining anonymity, there must be a way to reverse them for audit purposes.
you want to do that on-the-fly or during postprocessing? Right now it is not possible to do with patterndb only as it only extracts information from messages and never changes them, but anonimization has always been a hidden agenda of patterndb, which never materialized. -- Bazsi
One copy of the original logs is logged on to the disk and the anonimized copy gets forwarded. On Wed, Dec 21, 2011 at 6:47 PM, Balazs Scheidler <bazsi@balabit.hu> wrote:
On Wed, 2011-12-21 at 14:31 +0530, Anup Shetty wrote:
I am new to syslog-ng and would like some help on the pattern matching and the substitution option. Currently the requirement is to substitute a parameter in the message with a random value in order to anonymize it.
For example:
Dec 31 23:13:25 servername sshd[25218]: Failed keyboard-interactive/pam for user1 from 10.x.x.x port 47325 ssh2
If I create a pattern database for this message and pick out the username using the string and substitute it user1 to say anon1, will I be able to store the original-substituted value pair for this user and use it repeatedly? Would I be able to do it for all the subsequent logs?
To be more clear, an example substitution process that must happen as the logs arrive and the patterns are matched. log with user1 arrives and is substituted by anon1 log with user2 arrives and is substituted by anon2 again log with user1 arrives and is again substituted by anon1 log with user3 arrives and is substituted by anon3 again log with user2 arrives and is again substituted by anon2 . . . . This is required so that once the usernames are substituted for attaining anonymity, there must be a way to reverse them for audit purposes.
you want to do that on-the-fly or during postprocessing?
Right now it is not possible to do with patterndb only as it only extracts information from messages and never changes them, but anonimization has always been a hidden agenda of patterndb, which never materialized.
-- Bazsi
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Thanks Anup
Caveat: I have not done this (this is just a thought) Could you have syslog-ng send the appropriate logs to a program destination, and use the program to anonymize the data? For example, a Perl script could store the real values in a database / associative array, replacing them with randomized values. Then let the Perl script either write to a [file|pipe|socket] that syslog-ng would listen to and handle as if it were a real log. <log source> --> <syslog-ng> --> <perl script> --> syslog-ng --> <final destination> Using berkeley db, the Perl script could preserve the mapping in a file, so if you needed to it would be a simple function to de-anonymize the logfile. Sounds a lot worse than it would be I imagine. (but then everything is easy for he who does not have to do it :-) Later, Jim On Wed, 2011-12-21 at 14:17 +0100, Balazs Scheidler wrote:
On Wed, 2011-12-21 at 14:31 +0530, Anup Shetty wrote:
I am new to syslog-ng and would like some help on the pattern matching and the substitution option. Currently the requirement is to substitute a parameter in the message with a random value in order to anonymize it.
For example:
Dec 31 23:13:25 servername sshd[25218]: Failed keyboard-interactive/pam for user1 from 10.x.x.x port 47325 ssh2
If I create a pattern database for this message and pick out the username using the string and substitute it user1 to say anon1, will I be able to store the original-substituted value pair for this user and use it repeatedly? Would I be able to do it for all the subsequent logs?
To be more clear, an example substitution process that must happen as the logs arrive and the patterns are matched. log with user1 arrives and is substituted by anon1 log with user2 arrives and is substituted by anon2 again log with user1 arrives and is again substituted by anon1 log with user3 arrives and is substituted by anon3 again log with user2 arrives and is again substituted by anon2 . . . . This is required so that once the usernames are substituted for attaining anonymity, there must be a way to reverse them for audit purposes.
you want to do that on-the-fly or during postprocessing?
Right now it is not possible to do with patterndb only as it only extracts information from messages and never changes them, but anonimization has always been a hidden agenda of patterndb, which never materialized.
Thanks Jim, With this process would I not loose syslog-ng's buffer mechanism in case my destination is temporarily unavailable? The other thing I fear would be to loose the IP address of the the source and end up with local address of the relay server if I pass back the logs from the script to a file and then back to the syslog-ng. Another thought: is it possible to increment/decrement the characters, like in a Ceaser's cipher? So when username pattern it matched, it just increments it by a value defined and passes it on. In such case we would just need to remember the incremented value to regain the original. so username "adh" becomes "bei" (each character incremented by 1) On Wed, Dec 21, 2011 at 9:34 PM, Jim <jrhendri@maine.rr.com> wrote:
Caveat: I have not done this (this is just a thought)
Could you have syslog-ng send the appropriate logs to a program destination, and use the program to anonymize the data?
For example, a Perl script could store the real values in a database / associative array, replacing them with randomized values.
Then let the Perl script either write to a [file|pipe|socket] that syslog-ng would listen to and handle as if it were a real log.
<log source> --> <syslog-ng> --> <perl script> --> syslog-ng --> <final destination>
Using berkeley db, the Perl script could preserve the mapping in a file, so if you needed to it would be a simple function to de-anonymize the logfile.
Sounds a lot worse than it would be I imagine.
(but then everything is easy for he who does not have to do it :-)
Later, Jim
On Wed, 2011-12-21 at 14:17 +0100, Balazs Scheidler wrote:
On Wed, 2011-12-21 at 14:31 +0530, Anup Shetty wrote:
I am new to syslog-ng and would like some help on the pattern matching and the substitution option. Currently the requirement is to substitute a parameter in the message with a random value in order to anonymize it.
For example:
Dec 31 23:13:25 servername sshd[25218]: Failed keyboard-interactive/pam for user1 from 10.x.x.x port 47325 ssh2
If I create a pattern database for this message and pick out the username using the string and substitute it user1 to say anon1, will I be able to store the original-substituted value pair for this user and use it repeatedly? Would I be able to do it for all the subsequent logs?
To be more clear, an example substitution process that must happen as the logs arrive and the patterns are matched. log with user1 arrives and is substituted by anon1 log with user2 arrives and is substituted by anon2 again log with user1 arrives and is again substituted by anon1 log with user3 arrives and is substituted by anon3 again log with user2 arrives and is again substituted by anon2 . . . . This is required so that once the usernames are substituted for attaining anonymity, there must be a way to reverse them for audit purposes.
you want to do that on-the-fly or during postprocessing?
Right now it is not possible to do with patterndb only as it only extracts information from messages and never changes them, but anonimization has always been a hidden agenda of patterndb, which never materialized.
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Thanks and regards, Anup
So - buffering would depend on how you setup the sources / destinations. I think you might preserve the buffering if you used sockets or pipes rather than files. As far as preserving the data, it should be pretty simple to write something that doesn't perturb the data except for the fields you want. something you could play with on the command line would be a place to start for the script. then you could send some logs through it (via head, tail or cat) and see what comes out. And depending on how you parse the modified data you might need to tweak the syslog-ng config to correctly get the IP address you want. It should be preserved in the syslog header though. Like I said in my original post, this is just a rough idea. as many of my professors would say "the rest is left as an exercise for the reader" :-) On Thu, 2011-12-22 at 01:17 +0530, Anup Shetty wrote:
Thanks Jim,
With this process would I not loose syslog-ng's buffer mechanism in case my destination is temporarily unavailable? The other thing I fear would be to loose the IP address of the the source and end up with local address of the relay server if I pass back the logs from the script to a file and then back to the syslog-ng.
Another thought: is it possible to increment/decrement the characters, like in a Ceaser's cipher? So when username pattern it matched, it just increments it by a value defined and passes it on. In such case we would just need to remember the incremented value to regain the original.
so username "adh" becomes "bei" (each character incremented by 1)
On Wed, Dec 21, 2011 at 9:34 PM, Jim <jrhendri@maine.rr.com> wrote: Caveat: I have not done this (this is just a thought)
Could you have syslog-ng send the appropriate logs to a program destination, and use the program to anonymize the data?
For example, a Perl script could store the real values in a database / associative array, replacing them with randomized values.
Then let the Perl script either write to a [file|pipe|socket] that syslog-ng would listen to and handle as if it were a real log.
<log source> --> <syslog-ng> --> <perl script> --> syslog-ng --> <final destination>
Using berkeley db, the Perl script could preserve the mapping in a file, so if you needed to it would be a simple function to de-anonymize the logfile.
Sounds a lot worse than it would be I imagine.
(but then everything is easy for he who does not have to do it :-)
Later, Jim
On Wed, 2011-12-21 at 14:17 +0100, Balazs Scheidler wrote: > On Wed, 2011-12-21 at 14:31 +0530, Anup Shetty wrote: > > I am new to syslog-ng and would like some help on the pattern matching > > and the substitution option. Currently the requirement is to > > substitute a parameter in the message with a random value in order to > > anonymize it. > > > > For example: > > > > Dec 31 23:13:25 servername sshd[25218]: Failed > > keyboard-interactive/pam for user1 from 10.x.x.x port 47325 ssh2 > > > > > > If I create a pattern database for this message and pick out the > > username using the string and substitute it user1 to say anon1, will I > > be able to store the original-substituted value pair for this user and > > use it repeatedly? > > Would I be able to do it for all the subsequent logs? > > > > > > To be more clear, an example substitution process that must happen as > > the logs arrive and the patterns are matched. > > log with user1 arrives and is substituted by anon1 > > log with user2 arrives and is substituted by anon2 > > again log with user1 arrives and is again substituted by anon1 > > log with user3 arrives and is substituted by anon3 > > again log with user2 arrives and is again substituted by anon2 > > . > > . > > . > > . > > This is required so that once the usernames are substituted for > > attaining anonymity, there must be a way to reverse them for audit > > purposes. > > you want to do that on-the-fly or during postprocessing? > > Right now it is not possible to do with patterndb only as it only > extracts information from messages and never changes them, but > anonimization has always been a hidden agenda of patterndb, which never > materialized. >
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
-- Thanks and regards, Anup
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq
participants (3)
-
Anup Shetty
-
Balazs Scheidler
-
Jim