10.1 term & terms查询
10.1.1 term
term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以我们的搜索词必须是文档分词集合中的一个。
比如说我们要查找省份(province)中为“湖北省”的所有文档,JSON如下:
{
"from": 0,
"size": 5,
"query": {
"term": {
"province": {
"value": "湖北省",
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 查询某个字段里含有某个关键词的文档
* @param indexName 索引名
* @param typeName TYPE
* @param fieldName 字段名称
* @param fieldValue 字段值
* @return 返回结果列表
* @throws IOException
*/
public List<Map<String,Object>> termQuery(String indexName, String typeName, String fieldName, String fieldValue) throws IOException {
List<Map<String,Object>> response =new ArrayList<>();
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery(fieldName, fieldValue));
sourceBuilder.from(0);
sourceBuilder.size(10);
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
response.add(hit.getSourceAsMap());
}
return response;
}
10.1.2 terms
在查询的字段只有一个值的时候,应该使用term而不是terms,在查询字段包含多个的时候才使用terms(类似于sql中的in、or),使用terms语法.
比如说我们要查找省份(province)为“湖北省”或“北京”的所有文档,JSON如下:
{
"from": 0,
"size": 5,
"query": {
"terms": {
"province": ["湖北省", "北京"],
"boost": 1.0
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 查询某个字段里含有多个关键词的文档
* @param indexName 索引名
* @param typeName TYPE
* @param fieldName 字段名称
* @param fieldValues 字段值
* @return 返回结果列表
* @throws IOException
*/
public List<Map<String,Object>> termsQuery(String indexName, String typeName, String fieldName, String... fieldValues) throws IOException {
List<Map<String,Object>> response =new ArrayList<>();
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termsQuery(fieldName,fieldValues));
sourceBuilder.from(0);
sourceBuilder.size(10);
searchRequest.source(sourceBuilder);
log.info("source:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
response.add(hit.getSourceAsMap());
}
return response;
}
10.1.3 测试演示用例
//查询省份为湖北省的数据
@Test
public void testTerm() throws IOException {
List<Map<String, Object>> list = baseQuery.termQuery(indexName, type, "province", "湖北省");
System.out.println("term查询数量:" + list.size());
System.out.println(list);
}
查询省份为湖北省或北京的数据
@Test
public void testTerms() throws IOException {
List<Map<String, Object>> list1 = baseQuery.termsQuery(indexName, type, "province", "湖北省","北京");
System.out.println("terms查询数量:" + list1.size());
System.out.println(list1);
}
10.1.4 学生实习题
参考上面的代码,实现如下功能:
1、查出手机号为“13800000000”的文档
2、查出公司名称为“途虎养车”或“盒马鲜生”的文档
10.2 match查询
像 match 或 query_string 这样的查询是高层查询,它们了解字段映射的信息:
1.如果查询 日期(date) 或 整数(integer) 字段,它们会将查询字符串分别作为日期或整数对待。
2.如果查询一个( not_analyzed )未分析的精确值字符串字段, 它们会将整个查询字符串作为单个词项对待。
3.但如果要查询一个( analyzed )已分析的全文字段, 它们会先将查询字符串传递到一个合适的分析器,然后生成一个供查询的词项列表。
一旦组成了词项列表,这个查询会对每个词项逐一执行底层的查询,再将结果合并,然后为每个文档生成一个最终的相关度评分。
match查询其实底层是多个term查询,最后将term的结果合并。
10.2.1 match_all查询
查询所有的文档。
JSON如下:
{
"query": {
"match_all": {
"boost": 1.0
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 查询所有文档
* @param indexName 索引名称
* @param typeName TYPE
* @throws IOException
*/
@Override
public void queryAll(String indexName, String typeName) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchSourceBuilder.from(0);
searchSourceBuilder.size(20);
searchRequest.source(searchSourceBuilder);
log.info("source:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println(hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println(hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest
//查询全部
@Test
public void testMatchsAll() throws IOException {
baseQuery.queryAll(indexName, type);
}
10.2.2 match查询
{
"query": {
"match": {
"smsContent": {
"query": "收货安装",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
}
对“收货安装”进行分词:
收货,货,安装
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* match 搜索
* @param indexName 索引名称
* @param typeName TYPE名称
* @param field 字段
* @param keyWord 搜索关键词
* @throws IOException
*/
@Override
public void queryMatch(String indexName, String typeName, String field,String keyWord) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery(field,keyWord));
searchRequest.source(searchSourceBuilder);
log.info("source:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest
@Test
public void testQueryMatch() throws IOException {
baseQuery.queryMatch(indexName,type,"smsContent","收货安装");
}
查询结果
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oTf61FRJ-1584624053058)(mdpic/es_match.png)]
10.2.3 布尔match查询
查询短信内容中 即有“中国”同时也有“健康”的文档:
{
"query": {
"match": {
"smsContent": {
"query": "中国 健康",
"operator": "AND",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
}
查询短信内容中有“中国”或“健康”的文档:
{
"query": {
"match": {
"smsContent": {
"query": "中国 健康",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 布尔match查询
* @param indexName 索引名称
* @param typeName TYPE名称
* @param field 字段名称
* @param keyWord 关键词
* @param op 该参数取值为or 或 and
* @throws IOException
*/
@Override
public void queryMatchWithOperate(String indexName, String typeName, String field,String keyWord,Operator op) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery(field,keyWord).operator(op));
searchRequest.source(searchSourceBuilder);
log.info("source:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest
@Test
public void testMatchWithOperate() throws IOException {
baseQuery.queryMatchWithOperate(indexName, type, "smsContent", "中国 健康", Operator.AND);
baseQuery.queryMatchWithOperate(indexName, type, "smsContent", "中国 健康", Operator.OR);
}
查询结果

查询结果2:

10.2.4 multi_match查询
multi_match查询与match查询类似,不同的是他不在作用于一个字段上,该查询通过字段fields参数作用在多个字段上。
例如:查询省份(province)与短信内容(smsContent)中含有“北京”关键字的文档:
{
"query": {
"multi_match": {
"query": "北京",
"fields": ["province^1.0", "smsContent^1.0"],
"type": "best_fields",
"operator": "OR",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1.0
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
*该查询通过字段fields参数作用在多个字段上。
* @param indexName 索引名称
* @param typeName TYPE名称
* @param keyWord 关键字
* @param fieldNames 字段
* @throws IOException
*/
@Override
public void queryMulitMatch(String indexName, String typeName,String keyWord,String ...fieldNames) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.multiMatchQuery(keyWord,fieldNames));
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest
@Test
public void testQueryMulitMatch() throws IOException {
baseQuery.queryMulitMatch(indexName,type,"北京","smsContent","province");
}
查询结果:
10.2.5 match_phrase 查询
对查询词语分析后构建一个短语查询,而不是一个布尔表达式。
一、什么是近似匹配
match_phrase的使用场景
现假设有两个句子
1、java is my favourite programming language, and I also think spark is a very good big data system.
2、java spark are very related, because scala is spark’s programming language and scala is also based on jvm like java.
进行match query,query语法如下:
{
"query":{
"match": {
"content": "java spark"
}
}
}
match query进行搜索,只能搜索到包含java或spark的document,包含java和spark的doc都会被返回回来。现在假如说我们要实现以下三个需求:
1、java spark,就靠在一起,中间不能插入任何其他字符,就要搜索出来这种doc
2、java spark,但是要求,java和spark两个单词靠的越近,doc的分数越高,排名越靠前
3、我们搜索时,文档中必须包含java spark这两个文档,且他们之间的距离不能超过5,
要实现上述三个需求,用match做全文检索,是搞不定的,必须得用proximity match(近似匹配),proximity match分两种,短语匹配(phrase match)和近似匹配(proximity match)。这一讲,要学习的是phrase match,就是仅仅搜索出java和spark靠在一起的那些doc,比如有个doc,是java use’d spark,这就不是结果。
二、match_phrase的用法
phrase match,就是要去将多个term作为一个短语,一起去搜索,只有包含这个短语的doc才会作为结果返回。match是只在包含其中任何一个分词就返回。
1、match语法:
GET /forum/article/_search
{
"query": {
"match": {
"content": "java spark"
}
}
}
单单包含java的doc也返回了,不是我们想要的结果
2、改一个数据,将一个doc的content设置为恰巧包含java spark这个短语,以方便搜索
POST /forum/article/5/_update
{
"doc": {
"content": "spark is best big data solution based on scala ,an programming language similar to java spark"
}
}
3、match_phrase语法
GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": "java spark"
}
}
}
结果只返回了最后我们修改的那个doc,只包含java或spark的doc不会返回
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 对查询词语分析后构建一个短语查询
* @param indexName 索引名称
* @param typeName TYPE名称
* @param fieldName 字段名称
* @param keyWord 关键字
* @throws IOException
*/
@Override
public void queryMatchPhrase(String indexName, String typeName,String fieldName,String keyWord) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchPhraseQuery(fieldName,keyWord));
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.BaseQueryTest
@Test
public void testMathPhrase() throws IOException, InterruptedException {
SmsSendLog smsSendLog5 = new SmsSendLog();
smsSendLog5.setMobile("13600000088");
smsSendLog5.setCorpName("中国移动");
smsSendLog5.setCreateDate(new Date());
smsSendLog5.setSendDate(new Date());
smsSendLog5.setIpAddr("10.126.2.8");
smsSendLog5.setLongCode("10690000998");
smsSendLog5.setReplyTotal(60);
smsSendLog5.setProvince("湖北省");
smsSendLog5.setOperatorId(1);
smsSendLog5.setSmsContent("java is my favourite programming language, and I also think spark is a very good big data system..");
docService.add(indexName,type, JSON.toJSONString(smsSendLog5),"200");
smsSendLog5.setSmsContent("java spark very related, because scala is spark's programming language and scala is also based on jvm like java.");
docService.add(indexName,type, JSON.toJSONString(smsSendLog5),"300");
//不是实时刷新,不休眠一下可能查不出来
Thread.sleep(3000l);
baseQuery.queryMatchPhrase(indexName,type,"smsContent","java spark");
}
1.1 ids查询
ids查询是一类简单的查询,它过滤返回的文档只包含其中指定标识符的文档,该查询默认指定作用在“_id”上面。
例如查询_id为1,2,3的文档:
{
"query": {
"ids": {
"type": [],
"values": ["1", "2", "3"],
"boost": 1.0
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 查出指定_id的文档
* @param indexName 索引名称
* @param typeName TYPE名称
* @param ids _id值
* @throws IOException
*/
@Override
public void idsQuery(String indexName, String typeName,String ... ids) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.idsQuery().addIds(ids));
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testIdsQuery() throws IOException, InterruptedException {
baseQuery.idsQuery(indexName,type,"1","2","3");
}
1.2 prefix查询
前缀查询,可以使我们找到某个字段以给定前缀开头的文档。
最典型的使用场景,一般是在文本框录入的时候的联想功能,如下所示:
例如想查到公司名以“中国”开头的文档:
{
"query": {
"prefix": {
"corpName": {
"value": "中国",
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 查找某字段以某个前缀开头的文档
* @param indexName 索引名称
* @param typeName TYPE名称
* @param field 字段
* @param prefix 前缀
* @throws IOException
*/
@Override
public void prefixQuery(String indexName, String typeName, String field, String prefix) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.prefixQuery(field,prefix));
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testPrefixQuery() throws IOException, InterruptedException {
baseQuery.prefixQuery(indexName,type,"corpName","中国");
}
1.3 fuzzy查询
fuzzy才是实现真正的模糊查询,我们输入的字符可以是个大概,他可以根据我们输入的文字大概进行匹配查询。
参数说明:
prefix_length不能被 “模糊化” 的初始字符数。 大部分的拼写错误发生在词的结尾,而不是词的开始。 例如通过将
prefix_length设置为3,你可能够显著降低匹配的词项数量。
例如用户在查询过程中把“中国移动”输为了“中国联动”:
{
"query": {
"fuzzy": {
"corpName": {
"value": "中国联动",
"fuzziness": "AUTO",
"prefix_length": 0,
"max_expansions": 50,
"transpositions": false,
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 查找某字段以某个前缀开头的文档
* @param indexName 索引名称
* @param typeName TYPE名称
* @param field 字段
* @param value 查询关键字
* @throws IOException
*/
@Override
public void fuzzyQuery(String indexName, String typeName,String field,String value) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.fuzzyQuery(field,value).prefixLength(2));
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
把prefixLength(2)参数改为prefixLength(3)就查不到结果了,不能被 “模糊化” 的初始字符变成了中国联。
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testfuzzyQuery() throws IOException, InterruptedException {
baseQuery.fuzzyQuery(indexName,type,"corpName","中国联动");
}
学生实习题
用户输入错误地把“中国平安”输为了“中国保安”,通过fuzzy查询也可以查出公司名为中国平安的文档。
1.4 wildcard查询
wildcard查询允许我们在要查询的内容中使用通配符*和?,和SQL语句。
注意:wildcard查询不注意查询性能,应尽可能避免使用。
例如用户在查询公司名以“中国”开头的文档:
{
"query": {
"wildcard": {
"corpName": {
"wildcard": "中国*",
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 以通配符来查询
* @param indexName 索引名称
* @param typeName TYPE名称
* @param fieldName 字段名称
* @param wildcard 通配符
* @throws IOException
*/
@Override
public void wildCardQuery(String indexName, String typeName, String fieldName, String wildcard) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.wildcardQuery(fieldName, wildcard));
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testWildCardQuery() throws IOException, InterruptedException {
baseQuery.wildCardQuery(indexName,type,"corpName","中国*");
baseQuery.wildCardQuery(indexName,type,"corpName","中国?安保险有限公司");
}
1.5 range查询
本章到目前为止,对于数字,只介绍如何处理精确值查询。 实际上,对数字范围进行过滤有时会更有用。例如,我们可能想要查找所有价格大于 $20 且小于 $40 美元的产品。
在 SQL 中,范围查询可以表示为:
SELECT document
FROM products
WHERE price BETWEEN 20 AND 40
Elasticsearch 有 range 查询, 不出所料地,可以用它来查找处于某个范围内的文档:
"range" : {
"price" : {
"gte" : 20,
"lte" : 40
}
}
range 查询可同时提供包含(inclusive)和不包含(exclusive)这两种范围表达式,可供组合的选项如下:
gt:>大于(greater than)lt:<小于(less than)gte:>=大于或等于(greater than or equal to)lte:<=小于或等于(less than or equal to)
例如要查询出短信状态报告响应时间在1-20秒之内的文档:
string: {
"query": {
"range": {
"replyTotal": {
"from": 1,
"to": 20,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
* 范围查询
* @param indexName 索引名称
* @param typeName TYPE名称
* @param fieldName 字段名称
* @param from
* @param to
* @throws IOException
*/
@Override
public void rangeQuery(String indexName, String typeName, String fieldName, int from,int to) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.rangeQuery(fieldName).from(from).to(to));
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testRangeCardQuery() throws IOException {
baseQuery.rangeQuery(indexName,type,"replyTotal",1,20);
}
学生实习题
查询价格(fee)在5-8分之间的短信文档。
1.6 regexp查询
正则表达式查询,wildcard和regexp查询的工作方式和prefix查询完全一样。它们也需要遍历倒排索引中的词条列表来找到所有的匹配词条,然后逐个词条地收集对应的文档ID。它们和prefix查询的唯一区别在于它们能够支持更加复杂的模式。
这也意味着使用它们存在相同的风险。对一个含有很多不同词条的字段运行这类查询是非常消耗资源的。避免使用一个以通配符开头的模式(比如,*foo)。
尽管对于前缀匹配,可以在索引期间准备你的数据让它更加高效,通配符和正则表达式匹配只能在查询期间被完成。虽然使用场景有限,但是这些查询也有它们的用武之地。
注意:prefix,wildcard以及regexp查询基于词条进行操作。如果你在一个analyzed字段上使用了它们,它们会检查字段中的每个词条,而不是整个字段。
例如查找长号码(longCode)以1069开头后面是数字的文档:
{
"query": {
"regexp": {
"longCode": {
"value": "1069[0-9].+",
"flags_value": 65535,
"max_determinized_states": 10000,
"boost": 1.0
}
}
}
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
/**
*正则表达示查询
* @param indexName 索引名称
* @param typeName TYPE名称
* @param fieldName 字段名称
* @param regexp 正则表达示
* @throws IOException
*/
@Override
public void regexpQuery(String indexName, String typeName, String fieldName, String regexp) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.regexpQuery(fieldName,regexp));
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
System.out.println("count:"+hits.totalHits);
SearchHit[] h = hits.getHits();
for (SearchHit hit : h) {
System.out.println("结果"+hit.getSourceAsMap());
}
}
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testRegexpQuery() throws IOException {
String regex = "1069[0-9].+";
baseQuery.regexpQuery(indexName,type,"longCode",regex);
}
1.7 scroll查询
ES对于from+size的个数是有限制的,二者之和不能超过1w。当所请求的数据总量大于1w时,可用scroll来代替from+size。
1.7.1 原理
ES的搜索是分2个阶段进行的,即Query阶段和Fetch阶段。 Query阶段比较轻量级,通过查询倒排索引,获取满足查询结果的文档ID列表。 而Fetch阶段比较重,需要将每个shard的结果取回,在协调结点进行全局排序。 通过From+size这种方式分批获取数据的时候,随着from加大,需要全局排序并丢弃的结果数量随之上升,性能越来越差。
而Scroll查询,先做轻量级的Query阶段以后,免去了繁重的全局排序过程。 它只是将查询结果集,也就是doc id列表保留在一个上下文里, 之后每次分批取回的时候,只需根据设置的size,在每个shard内部按照一定顺序(默认doc_id续), 取回这个size数量的文档即可。
1.7.2 使用场景
由此也可以看出scroll不适合支持那种实时的和用户交互的前端分页工作,其主要用途用于从ES集群分批拉取大量结果集的情况,一般都是offline的应用场景。 比如需要将非常大的结果集拉取出来,存放到其他系统处理,或者需要做大索引的reindex等等。
不要把 scroll 用于实时请求,它主要用于大数据量的场景。例如:将一个索引的内容索引到另一个不同配置的新索引中。
1.7.3 JAVA代码示例
例如滚动查询所有文档:
POST 127.0.0.1:9200/my_index/_search?scroll=1m
{
"query": {
"match_all" : {}
},
"sort": [
"_doc"
]
}
}
清除scroll
虽然我们在设置开启scroll时,设置了一个scroll的存活时间,但是如果能够在使用完顺手关闭,可以提早释放资源,降低ES的负担.
DELETE 127.0.0.1:9200/_search/scroll
{
"scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAdsMqFmVkZTBJalJWUmp5UmI3V0FYc2lQbVEAAAAAAHbDKRZlZGUwSWpSVlJqeVJiN1dBWHNpUG1RAAAAAABpX2sWclBEekhiRVpSRktHWXFudnVaQ3dIQQAAAAAAaV9qFnJQRHpIYkVaUkZLR1lxbnZ1WkN3SEEAAAAAAGlfaRZyUER6SGJFWlJGS0dZcW52dVpDd0hB"
}
JAVA代码示例:com.javablog.elasticsearch.query.impl.BaseQueryImpl
@Override
public void scrollQuery(String indexName, String typeName) throws IOException {
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.types(typeName);
//初始化scroll
//值不需要足够长来处理所有数据—它只需要足够长来处理前一批结果。每个滚动请求(带有滚动参数)设置一个新的过期时间。
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); //设定滚动时间间隔
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchAllQuery());
searchSourceBuilder.size(5); //设定每次返回多少条数据
searchRequest.source(searchSourceBuilder);
log.info("string:" + searchRequest.source());
SearchResponse searchResponse = null;
try {
searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();
System.out.println("-----首页-----");
for (SearchHit searchHit : searchHits) {
System.out.println(searchHit.getSourceAsString());
}
//遍历搜索命中的数据,直到没有数据
while (searchHits != null && searchHits.length > 0) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
log.info("string:" + scrollRequest.toString());
try {
searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
if (searchHits != null && searchHits.length > 0) {
System.out.println("-----下一页-----");
for (SearchHit searchHit : searchHits) {
System.out.println(searchHit.getSourceAsString());
}
}
}
//清除滚屏
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);//也可以选择setScrollIds()将多个scrollId一起使用
ClearScrollResponse clearScrollResponse = null;
try {
clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
boolean succeeded = clearScrollResponse.isSucceeded();
System.out.println("succeeded:" + succeeded);
}
演示用例:com.javablog.elasticsearch.test.document.testIdsQuery
@Test
public void testScrollQuery() throws IOException {
baseQuery.scrollQuery(indexName,type);
}
演示效果:
完整代码:https://github.com/chutianmen/elasticsearch-examples