Re: [syslog-ng] (no subject)

21 Jul 2014

      Hi Fabien,

Aggregations are means to count terms from documents, and you could combine
them to get powerful statistics. In my case, tags are not analyzed, so each
tag is a term. The terms aggregation
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html>
on my tags field would then give me the top N most frequent tags.

If I'm analyzing the field, things get more complicated. For example, if
the "kernel error" tag would be analyzed into "kernel" and "error", I would
get "kernel" and "error" separately, which would be confusing.

Thinking about what you suggested, I could have a comma-separated list of
tags, and use the pattern tokenizer
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html>
to separate terms when encountering a comma. This should give me what I
need on both searches and aggregations. The only edge-case would be if a
tag would contain a comma, but I can live with that, or even let users
escape it.

I'll let the idea bake a bit, thanks again for your suggestions!

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Mon, Jul 21, 2014 at 3:19 PM, Fabien Wernli <wernli@in2p3.fr> wrote:
...
Hi,
On Mon, Jul 21, 2014 at 02:50:58PM +0300, Radu Gheorghe wrote:
...
- let users do exact matches, especially for multi-word tags like "user
error"
- be able to run a terms aggregation on them and show the available tags
I'm not familiar with aggregations, but you could achieve the first
requirement by using a custom analyzer which splits on the coma only with
no token filter
______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation:
http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq