decoding messages from sockd (SOCKS proxy)
Hello all, I am running into some headaches with the poor formatting of sockd messages. How should I decode messages like this? Note I have not applied XML escapes to these yet as that's hard to read but I will do so when inserting them into a patterndb to prevent parse errors. Every message in this group begins with this string on one line: sockd[@NUMBER:pid:@]: @ESTRING:action::@ @ESTRING:phase::@ @IPv4:src:@.@NUMBER:srcport:@ -> Then there are a few different endings which happen in some messages that are giving me problems to decode. Here are three examples from my collected logs: smarthost.company.com.25 host.team.division.company.com.18050: invalid address: 0.0.0.0.18050 company.com.443: Connection reset by peer I am having a hard time figuring out how to break these up into domain name (src / dst as appropriate) and port (srcport / dstport). My best thought so far was to detect this and rewrite them using PCRE before applying patterndb matching. I could find the .[0-9]+ and replace with :\1, then I have the port delimited with ':' and I can pull it apart using: @ESTRING:src::@@NUMBER:srcport:@ Is it possible to do PCRE replacement using backreferences? Or is there another way to get this to work? Thanks, Matthew.
Hi, Sorry for no sooner answer, my backlog is just not getting smaller. I hope to address this for you anyway. On Thu, 2010-10-07 at 14:40 -0700, Matthew Hall wrote:
Hello all,
I am running into some headaches with the poor formatting of sockd messages. How should I decode messages like this?
Note I have not applied XML escapes to these yet as that's hard to read but I will do so when inserting them into a patterndb to prevent parse errors. Every message in this group begins with this string on one line:
sockd[@NUMBER:pid:@]: @ESTRING:action::@ @ESTRING:phase::@ @IPv4:src:@.@NUMBER:srcport:@ ->
Then there are a few different endings which happen in some messages that are giving me problems to decode. Here are three examples from my collected logs:
smarthost.company.com.25 host.team.division.company.com.18050: invalid address: 0.0.0.0.18050 company.com.443: Connection reset by peer
I am having a hard time figuring out how to break these up into domain name (src / dst as appropriate) and port (srcport / dstport).
My best thought so far was to detect this and rewrite them using PCRE before applying patterndb matching. I could find the .[0-9]+ and replace with :\1, then I have the port delimited with ':' and I can pull it apart using:
@ESTRING:src::@@NUMBER:srcport:@
Is it possible to do PCRE replacement using backreferences? Or is there another way to get this to work?
My best bet to use the csv-parser() before doing patterndb matching. you can specify the delimiter to be ':', the first column is the hostname + port, the 2nd is the "error message". Then to split the first column, you could perhaps use PCRE to cut out the last '.' terminated portion. Backrefs are however quite slow, especially if you want to use backrefs right in the pattern (and not in the replacement). Also note that you can have a match() filter store its matches using flags(store-matches), they'd be stored as $1, $2, etc, or if you use named groups, then $groupname will work as name-value pairs. -- Bazsi
On Fri, Oct 15, 2010 at 10:08:16PM +0200, Balazs Scheidler wrote:
Is it possible to do PCRE replacement using backreferences? Or is there another way to get this to work?
My best bet to use the csv-parser() before doing patterndb matching. you can specify the delimiter to be ':', the first column is the hostname + port, the 2nd is the "error message". Then to split the first column, you could perhaps use PCRE to cut out the last '.' terminated portion.
Good proposal.
Backrefs are however quite slow, especially if you want to use backrefs right in the pattern (and not in the replacement).
I receive about 300 messages per second from this source over a typical almost-24-hour period of data. So I guess I can start with a backref and change to something more complicated if it doesn't work well. Unless 300 MPS is already too much?
Also note that you can have a match() filter store its matches using flags(store-matches), they'd be stored as $1, $2, etc, or if you use named groups, then $groupname will work as name-value pairs.
Helpful to know.
Bazsi
Matthew.
participants (2)
-
Balazs Scheidler
-
Matthew Hall