es 去重统计_es 去重查询(聚合、分组、分页、求和统计等)

elasticsearch(es) 如何针对指定字段进行去重相关查询,完成如聚合、分组、分页、类似求和统计等操作?

获取所有的不同值

es 获取指定字段所有可能的值,可以使用桶聚合的 terms 聚合,如下示例:

GET {index}/_search

{

"size": 0,

"aggs": {

"distinct_aggs": {

"terms": {

"field": "status"

}

}

}

}

如上示例,获取指定索引的 status 字段的不同值,size 字段设置为 0,表示搜索出来的文档数为 0 个,也表示不关心文档内容只要聚合结果。 如果为 1 ,就会搜索出 1 个文档。返回如下:

{

"took": 2,

"timed_out": false,

"_shards": {

"total": 3,

"successful": 3,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 58439,

"max_score": 0,

"hits": []

},

"aggregations": {

"distinct_aggs": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": 3,

"doc_count": 46619

},

{

"key": 2,

"doc_count": 11810

},

{

"key": 1,

"doc_count": 10

}

]

}

}

}

去重后分页

分页的话,肯定需要有排序规则,接着如上示例,增加的获取的条数参数 size 和 排序参数 order 即可:

GET {index}/_search

{

"size": 0,

"aggs": {

"distinct_aggs": {

"terms": {

"field": "item_id",

"size" : 1000,

"order": {

"_term": "asc"

}

}

}

}

}

输出如下:

{

"took": 1,

"timed_out": false,

"_shards": {

"total": 3,

"successful": 3,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 58463,

"max_score": 0,

"hits": []

},

"aggregations": {

"distinct_aggs": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": 1,

"doc_count": 32

},

{

"key": 2,

"doc_count": 11811

},

{

"key": 3,

"doc_count": 46620

},

...

]

}

}

}

聚合求和统计

聚合字段的排序,也可以通过指定字段的求和等计算统计结果后进行升降序排序,具体示例如下:

GET {index}/_search

{

"size": 0,

"aggs": {

"item_terms": {

"terms": {

"field": "item_id",

"size": 1000,

"order":[{

"gmv_stat": "desc"

},{

"gmv_180d": "desc"

}]

},

"aggs": {

"gmv_stat": {

"sum": {

"field": "gmv"

}

},

"gmv_180d": {

"sum": {

"script": "doc['gmv_90d'].value*2"

}

}

}

}

}

}

返回如下:

{

...

"aggregations": {

"item_terms": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 260,

"buckets": [

{

"key": 23388,

"doc_count": 18,

"gmv_stat": {

"value": 176220

},

"gmv_180d": {

"value": 89732

}

},

{

"key": 96117,

"doc_count": 16,

"gmv_stat": {

"value": 129306

},

"gmv_180d": {

"value": 56988

}

},

...

]

}

}

}


版权声明:本文为weixin_39631094原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。