Hi Fabien,
Aggregations are means to count terms from documents, and you could combine them to get powerful statistics. In my case, tags are not analyzed, so each tag is a term. The
terms aggregation on my tags field would then give me the top N most frequent tags.
If I'm analyzing the field, things get more complicated. For example, if the "kernel error" tag would be analyzed into "kernel" and "error", I would get "kernel" and "error" separately, which would be confusing.
Thinking about what you suggested, I could have a comma-separated list of tags, and use the
pattern tokenizer to separate terms when encountering a comma. This should give me what I need on both searches and aggregations. The only edge-case would be if a tag would contain a comma, but I can live with that, or even let users escape it.
I'll let the idea bake a bit, thanks again for your suggestions!
Best regards,
Radu