Hi Fabien, Aggregations are means to count terms from documents, and you could combine them to get powerful statistics. In my case, tags are not analyzed, so each tag is a term. The terms aggregation <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html> on my tags field would then give me the top N most frequent tags. If I'm analyzing the field, things get more complicated. For example, if the "kernel error" tag would be analyzed into "kernel" and "error", I would get "kernel" and "error" separately, which would be confusing. Thinking about what you suggested, I could have a comma-separated list of tags, and use the pattern tokenizer <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html> to separate terms when encountering a comma. This should give me what I need on both searches and aggregations. The only edge-case would be if a tag would contain a comma, but I can live with that, or even let users escape it. I'll let the idea bake a bit, thanks again for your suggestions! Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Jul 21, 2014 at 3:19 PM, Fabien Wernli <wernli@in2p3.fr> wrote:
Hi,
On Mon, Jul 21, 2014 at 02:50:58PM +0300, Radu Gheorghe wrote:
- let users do exact matches, especially for multi-word tags like "user error" - be able to run a terms aggregation on them and show the available tags
I'm not familiar with aggregations, but you could achieve the first requirement by using a custom analyzer which splits on the coma only with no token filter
______________________________________________________________________________ Member info: https://lists.balabit.hu/mailman/listinfo/syslog-ng Documentation: http://www.balabit.com/support/documentation/?product=syslog-ng FAQ: http://www.balabit.com/wiki/syslog-ng-faq