Hi Fabien,

Aggregations are means to count terms from documents, and you could combine them to get powerful statistics. In my case, tags are not analyzed, so each tag is a term. The terms aggregation on my tags field would then give me the top N most frequent tags.

If I'm analyzing the field, things get more complicated. For example, if the "kernel error" tag would be analyzed into "kernel" and "error", I would get "kernel" and "error" separately, which would be confusing.

Thinking about what you suggested, I could have a comma-separated list of tags, and use the pattern tokenizer to separate terms when encountering a comma. This should give me what I need on both searches and aggregations. The only edge-case would be if a tag would contain a comma, but I can live with that, or even let users escape it.

I'll let the idea bake a bit, thanks again for your suggestions!

Best regards,

Radu

Performance Monitoring * Log Analytics * Search Analytics

Solr & Elasticsearch Support * http://sematext.com/

On Mon, Jul 21, 2014 at 3:19 PM, Fabien Wernli <wernli@in2p3.fr> wrote:

Hi,

On Mon, Jul 21, 2014 at 02:50:58PM +0300, Radu Gheorghe wrote:
> - let users do exact matches, especially for multi-word tags like "user
> error"
> - be able to run a terms aggregation on them and show the available tags

I'm not familiar with aggregations, but you could achieve the first
requirement by using a custom analyzer which splits on the coma only with
no token filter

______________________________________________________________________________
Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng
Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng
FAQ: http://www.balabit.com/wiki/syslog-ng-faq