ES聚合分页
聚合分页
ES支持同时返回查询结果和聚合结果,前面的博客在介绍聚合查询时,查询结果和聚合结果各自封装在不同的子句中。但有时我们希望聚合的结果按照每组选出前N个文档的方式进行呈现,最常见的一个场景就是电商搜索,如搜索苹果手机6S,搜索结果应该展示手机6S型号中的一款手机即可,而不论该型号手机的颜色有多少种。另外,当聚合结果和查询结果封装在一起时,还需要考虑对结果分页的问题,此时之前的博客介绍的聚合查询就不能解决这些问题了。
ES提供的Top hits
聚合和Collapse
聚合可以满足上述需求,但是这两种查询的分页方案是不同的。
1.1 Top hits聚合
Top hits
聚合指的是聚合时在每个分组内按照某个规则选出前N个文档进行展示。
例如,搜索”金都“时,如果希望按照城市分组,每组按照匹配分数降序展示3条文档数据,DSL如下:
# Top hits聚合
GET /hotel_poly/_search
{
"query": {
"match": {
"title": "金都"
}
},
"aggs": {
"group_city": {
"terms": {
"field": "city"
},
"aggs": {
"my_avg": {
"top_hits": {
"size": 3
}
}
}
}
}
}
可以看到,在索引中一共有3个文档命中match
查询条件,在聚合结果中按照城市分成了两个组”北京“”天津“,在”北京“下面有两个文档命中,并且按照得分将展示文档进行了降序排列,”天津“只有一个文档命中。
在Java中使用Top hits聚合的逻辑如下:
public void getAggTopHitsSearch() throws IOException{
//创建搜索请求
SearchRequest searchRequest = new SearchRequest("hotel_poly");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
String termsAggName="my_terms"; //聚合的名称
TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms(termsAggName).field("city");
BucketOrder bucketOrder = BucketOrder.key(true);
termsAggregationBuilder.order(bucketOrder);
String topHitsAggName="my_top"; //聚合的名称
TopHitsAggregationBuilder topHitsAgg = AggregationBuilders.topHits(topHitsAggName);
topHitsAgg.size(3);
//定义聚合的父子关系
termsAggregationBuilder.subAggregation(topHitsAgg);
//添加聚合
searchSourceBuilder.aggregation(termsAggregationBuilder);
searchSourceBuilder.query(QueryBuilders.matchQuery("title","金都"));
searchRequest.source(searchSourceBuilder); //设置查询请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
//获取聚合结果
Aggregations aggregations = searchResponse.getAggregations();
//获取聚合返回的对象
Terms terms = aggregations.get(termsAggName);
for (Terms.Bucket bucket : terms.getBuckets()) {
String bucketKey = bucket.getKey().toString();
log.info("termsKey={}",bucketKey);
TopHits topHits=bucket.getAggregations().get(topHitsAggName);
SearchHit[] searchHits = topHits.getHits().getHits();
for (SearchHit searchHit : searchHits) {
log.info(searchHit.getSourceAsString());
}
}
}
Top hits
聚合能满足”聚合的结果按照每组选出N个文档的方式进行呈现“的需求,但是很遗憾,它不能完成自动分页功能。如果在聚合中使用Top hits
聚合并期望对数据进行分页,则要求聚合的结果一定不能太多,因为需要由客户端自行进行分页,此时对分页内存的存储能力是一个挑战。可以一次性获取聚合结果并将其存放在内存中或者Redis中,然后自行实现翻页逻辑,完成翻页。
1.2 Collapse聚合
如前面所述,当在索引中有大量数据命中时,Top hits聚合存在效率问题,并且需要用户自行排序。针对上述问题,ES推出了Collapse
聚合,即用户可以在collpase
子句中指定分组字段,匹配query
的结果按照该字段进行分组,并在每个分组中按照得分高低展示组内的文档。当用户在query
字句外指定from
和size
时,将作用在Collapse
聚合之后,即此时的分页是作用在分组之后的。
以下DSL展示了Collapse
聚合的用法:
# Collapse聚合
GET /hotel_poly/_search
{
"from": 0, //指定每页的起始位置
"size": 5, //指定每页返回的数量
"query": { //指定查询的query逻辑
"match": {
"title": "金都"
}
},
"collapse": { //指定按照城市进行Collapse聚合
"field": "city"
}
}
执行上述DSL后,ES返回的结果如下:
从结果中可以看到,与Top hits
聚合不同,Collapse
聚合的结果是封装在hit
中的。在索引中一共有3个文档命中match
查询条件,在聚合结果中已经按照城市分成了两个组,即”北京“”天津“,在”北京“下面有两个文档命中,其中得分最高的文档为003,”天津“只有一个文档命中。上述结果不仅能按照得分排序,并且具备分页功能。
在Java中使用Collapse聚合的逻辑如下:
public void getCollapseAggSearch() throws IOException{
//按照城市进行分组
CollapseBuilder collapseBuilder = new CollapseBuilder("city");
SearchRequest searchRequest = new SearchRequest("hotel_poly");//新建搜索请求
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//新建match查询
searchSourceBuilder.query(QueryBuilders.matchQuery("title","金都"));
searchSourceBuilder.collapse(collapseBuilder); //设置Collapse聚合
searchRequest.source(searchSourceBuilder); //设置查询
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);//执行搜索
SearchHits searchHits = searchResponse.getHits(); //获取搜索结果集
for (SearchHit searchHit : searchHits) {
String index = searchHit.getIndex(); //获取索引名称
String id = searchHit.getId(); //获取文档_id
float score = searchHit.getScore(); //获取得分
String source = searchHit.getSourceAsString(); //获取文档内容
log.info("index={},id={},score={},source={}",index,id,score,source);
}
}
数据源
索引结构
PUT /hotel_poly
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"title":{
"type": "text"
},
"city":{
"type": "keyword"
},
"price":{
"type": "double"
},
"create_time":{
"type": "date"
},
"full_room":{
"type": "boolean"
},
"location":{
"type": "geo_point"
},
"tags":{
"type": "keyword"
},
"comment_info":{
"properties": {
"favourable_comment":{
"type":"integer"
},
"negative_comment":{
"type":"integer"
}
}
}
}
}
}
酒店数据
POST /_bulk
{"index":{"_index":"hotel_poly","_id":"001"}}
{"title":"文雅假日酒店","city":"北京","price":556.00,"create_time":"20200418120000","full_room":true,"location":{"lat":39.938838,"lon":106.449112},"tags":["wifi","小型电影院"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"002"}}
{"title":"金都嘉怡假日酒店","city":"北京","create_time":"20210315200000","full_room":false,"location":{"lat":39.915153,"lon":116.4030},"tags":["wifi","免费早餐"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"003"}}
{"title":"金都假日酒店","city":"北京","price":200.00,"create_time":"20210509160000","full_room":true,"location":{"lat":40.002096,"lon":116.386673},"comment_info":{"favourable_comment":20,"negative_comment":10}}
{"index":{"_index":"hotel_poly","_id":"004"}}
{"title":"金都假日酒店","city":"天津","price":500.00,"create_time":"20210218080000","full_room":false,"location":{"lat":39.155004,"lon":117.203976},"tags":["wifi","免费车位"]}
{"index":{"_index":"hotel_poly","_id":"005"}}
{"title":"文雅精选酒店","city":"天津","price":800.00,"create_time":"20210101080000","full_room":true,"location":{"lat":39.178447,"lon":117.219999},"tags":["wifi","充电车位"],"comment_info":{"favourable_comment":20,"negative_comment":10}}
- 0
- 0
-
分享