Thanks Balazs,
  I will try some more "controlled" testing using different settings for syslog-ng resolving and caching.

I think I installed libgeopi-dev (not sure right now - it's installed on a system at work) so I'll check that package also.

One question on code paths:

If I use an IP address pattern in patterndb (within the message - e.g. proxy or email logs) where a ${GEO} macro was assigned, will those be the only things that get resolved? (or by setting a cache within the syslog-ng config will that enable resolution for ${HOST} as well?

I am (mostly) interested in things like user access to sites by IP address through the proxy, and wanting to enhance the logs with geoip data for elasticsearch.

(obviously if it were fast enough, I would add the data for all sites - but initially I think IP only would be more interesting)

Thanks again!
Jim



On 02/22/2015 02:35 PM, Balazs Scheidler wrote:
Hi,

I would think that adding forward DNS lookups to the syslog-ng dns cache code (or ripping out that code entirely and rewrite it from scratch while adding this feature) would produce _much_ better results than a locally running DNS server. That's why the DNS cache code was added in the first place, a caching only name server is still too slow for name lookups for every message posted.

The  geoip code uses libgeoip1.

The database is:

$ apt-cache show geoip-database
Package: geoip-database
Priority: standard
Section: net
Installed-Size: 3881

Version: 20140313-1
Recommends: libgeoip1
Breaks: libgeoip1 (<< 1.4.5.dfsg)
Filename: pool/main/g/geoip-database/geoip-database_20140313-1_all.deb
Size: 1195894
MD5sum: ab4d4f6bc0e04b25cad2fbe1479f44bc
SHA1: 06d38aee4084124f86351dfa6f1c404a8ae3e83b
SHA256: 30dc5a2c3296180ed0740fb4ec70eb1ea5b49efc5e48a091913a8106f6895c7e
Description-en: IP lookup command line tools that use the GeoIP library (country database)
 GeoIP is a C library that enables the user to find the country that any
 IP address or hostname originates from. It uses a file based database.
 .
 This database simply contains IP blocks as keys, and countries as values and
 it should be more complete and accurate than using reverse DNS lookups.
 .
 This package contains the free GeoLiteCountry database.
Description-md5: 3bfa5b4c9f973261799fb4d9355f3b6c
Homepage: http://www.maxmind.com/
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu
Supported: 5y
Task: standard, kubuntu-active, kubuntu-active, mythbuntu-frontend, mythbuntu-frontend, mythbuntu-desktop, mythbuntu-backend-slave, mythbuntu-backend-slave, mythbuntu-backend-master, mythbuntu-backend-master


So it is about a year old, but quite probably the version in Debian sid can be installed on top without problems, and that's pretty fresh, being dated 9th February.

https://packages.debian.org/sid/geoip-database



On Sat, Feb 21, 2015 at 1:24 PM, Jim Hendrick <jrhendri@roadrunner.com> wrote:
Hi Fabian,
  I have done just some preliminary testing (maybe 1500 EPS for a few
minutes) and was seeing a lot of dns traffic (~1MB/s)

Obviously, if the field is a hostname, to do a geoip lookup there needs
to be name resolution before the IP can be mapped to a geo database.

I will be looking for ways to minimize this.

Current use-cases are for parsing proxy, email and fire-eye logs.

Recall, my base architecture is
syslog-ng using patterndb sending format-json to a local redis
destination (lpush)
redis is run with no local disk storage and acts as an in-memory buffer
between syslog-ng and logstash
logstash (also running locally on the same box) pulling (blpop) and
feeding an elasticsearch cluster (4 nodes right now)

Currently taking live proxy logs at ~7 - 10 K EPS running very well.
Looking to add the email and fireeye logs soon and starting to enhance
the data (with user and host metadata)


Thoughts right now are:
- only resolve location for addresses (not hostnames)
- run a caching nameserver locally on the syslog-ng box and dealing with
the "ramp up" period
  (initially clearly the names would not be in cache - just not sure how
long it would take to get to a steady state and how big to make the
cache, etc.)

I'll keep you posted.

Thanks again!
Jim

On 02/20/2015 03:24 PM, Fabien Wernli wrote:
> Hi Jim,
>
> On Fri, Feb 20, 2015 at 01:52:19PM -0500, jrhendri@roadrunner.com wrote:
>>   Is anyone using it in reasonably high-performance environments? (like 5000+ events per second)
>>
> we're using the module in a 3keps environment with very good performance. we
> have had some issues in the past in threaded mode with some segfaults. The
> geoip library documentation mentions a few sentences about thread safety.
> I'd be curious to hear some feedback about your future
>  experience.
>
> cheers
> ______________________________________________________________________________
> Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
> Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
> FAQ: http://www.balabit.com/wiki/syslog-ng-faq
>
>

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq




--
Bazsi


______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq