使用场景

ElasticSearch是什么

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。 Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

ElasticSearch使用场景

ElasticSearch(后面简称ES）主要用来存储半结构化数据，实现搜索和分析功能；

ES的存储方式是JSON 主要的核心算法是倒排索引;

常用工具

Head插件(基于Node.js需手动安装)：

访问和管理工具：http://XXXX:9100/

在地址栏输入 http://XXXX:9200/ 连接

Kibana插件：

命令执行工具: http://XXXX:5601

存储结构

ES、ArteryBase、GreenPlum数据存储结构对比

结构层次

字段=>文档=>类型=>索引=>分片(shard)=>节点(node)=>集群(cluster)

副本

是为了保证在某一个节点down的时候，其所拥有的分片数据不丢失用户可以自定义分片数量、副本数量

JSON语法

查询：

创建索引(是否存在分词器；使用动态模板)

增加一个文档，ES中_ID问题

定义搜索字段、搜索条件、限制返回数量

term、match、match_phrase的使用和区别

ES里面的聚合函数

query与filter的使用和区别

详细样例：

1.基础查询用法
1)查询

分词
IK分词器

不进行分词的索引
PUT  /syltest
{
    "mappings": {
        "syltest": {
            "dynamic_templates": [{
                "es": {
                    "match": "*",
                    "match_mapping_type": "string",
                    "mapping": {
                        "type": "string",
                        "index": "not_analyzed"
                    }
                }
            }
            ]
        }
    }
}


GET  /syltest/_analyze?analyzer=ik_max_word
{
  "text":"es测试用例"
}

进行分词的索引
PUT  /syltest
{
    "mappings": {
        "syltest": {
            "properties": {
                "title":{
                   "type": "text",
           "analyzer": "ik_max_word"
                },
                "content":{
                   "type": "text",
           "analyzer": "ik_max_word"
                }
              
            }
        }
    }
}

动态模板
PUT  /syltest
{
	"mappings": {
		"syltest": {
		  "dynamic_templates":[{
		      "es":{
		        "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "analyzer": "ik_max_word"
            }
		      }
		    }]
		}
	}
}




动态模板，对某一字段进行分词过滤
PUT  /syltest
{
    "mappings": {
        "syltest": {
          "dynamic_templates":[
            {
              "eh":{
                "match": "content",
                "match_mapping_type": "string",
            "mapping": {
              "type": "keyword",
              "analyzer": "not_analyzed"
            }
              }
            },{
              "es":{
                "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "analyzer": "ik_max_word"
            }
              }
            }]
        }
    }
}




term语句：完全匹配，精确查找，不会对查询体进行分词

GET syltest/syltest/_search
{
  "query": {
    "term": {
      "title":  "测试"
    }
  },
  "size": 20,
  "_source":{
    "includes":[
      "title"
      ]
    ,"excludes":[]
  }
}

分词查询match和match_phrase区别：
会对查询体进行分词，match_phrase会要求查询体分词后顺序符合


GET /syltest/syltest/_search
{
  "query": {
    "match": {
      "title": "用例测试"
    }
  }
}


GET /syltest/syltest/_search
{
  "query": {
    "match_phrase": {
      "title": "用例测试"
    }
  }
}


2)聚合函数
groupBy用法
GET  /etllog/_search
{
    "aggs" : {
        "groupBy" : {
            "terms" : { "field" : "dbsource.keyword" }
        }
    }
}

count(distinct)用法（会有准确度问题）
GET  /etllog/_search
{
  "aggs": {
    "distinct": {
      "cardinality": {
        "field": "dbsource.keyword"
      }
    }
  }
}



3)过滤器
查询置于 filter 语句内不进行评分或相关度的计算，所以所有的结果都会返回一个默认评分 1 。


GET syltest/syltest/_search
{
  "filter": {
    "term": {
      "title":  "测试"
    }
  }
}


GET syltest/syltest/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "title": "测试"
        }
      }
    }
  }
}

使用query和filter查询的话，需要使用 {query:{filtered：{}}} 来包含这两个查询语法。他们的好处是，借助于filter的速度可以快速过滤出文档，然后再由query根据条件来匹配。

JAVA API

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html

详细样例：

package com.thunisoft.test;

import org.elasticsearch.action.search.SearchAction;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryParseContext;
import org.elasticsearch.plugins.SearchPlugin;
import org.elasticsearch.search.SearchModule;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.xpack.client.PreBuiltXPackTransportClient;

import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.Collections;

/**
 * @ProjectName: CallAble
 * @Package: com.thunisoft.test
 * @ClassName: ESTest
 * @Author: songyulin
 * @CreateDate: 2018/8/2 15:41
 * @UpdateRemark: 更新说明
 * @Version: 1.0
 */
public class ESTest {
    static TransportClient client = null;
    static {
        Settings settings = Settings.builder()
                .put("cluster.name", "esCluster")
                .put("xpack.security.transport.ssl.enabled", false)
                .put("client.transport.ping_timeout", "120s")
                .put("indices.store.throttle.type", "none")
                .build();
        try {
            client =new PreBuiltXPackTransportClient(settings).addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("172.16.11.57"),9300));
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws IOException {
        queryMethod();
        //queryMethodSecond();
    }
    //嵌套query查询方式
    public static void queryMethod(){
        QueryBuilder query = QueryBuilders.termQuery("title", "测试");
        SearchResponse sResponse = client.prepareSearch("syltest").setTypes("syltest").setQuery(query).setSize(20)
                .execute().actionGet();
        System.out.println(sResponse);
    }
    //直接读json方式
    public static void queryMethodSecond() throws IOException {
        String jsonStr = "{\n" +
                "  \"query\": {\n" +
                "    \"term\": {\n" +
                "      \"title\":  \"测试\"\n" +
                "    }\n" +
                "  }\n" +
                "}";
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        SearchModule searchModule = new SearchModule(Settings.EMPTY, false, new ArrayList<SearchPlugin>());
        try (
                XContentParser parser = XContentFactory.xContent(XContentType.JSON)
                        .createParser( new NamedXContentRegistry(searchModule.getNamedXContents()), jsonStr))
        {
            searchSourceBuilder.parseXContent(new QueryParseContext(parser));
        }
        SearchRequestBuilder searchRequestBuilder = new SearchRequestBuilder(client, SearchAction.INSTANCE);
        SearchResponse searchResponse = searchRequestBuilder.setSource(searchSourceBuilder).setIndices("syltest")
                .execute().actionGet();
        System.out.println(searchResponse);
    }
}

POM：

<groupId>org.elasticsearch</groupId>

<artifactId>elasticsearch</artifactId>

</dependency>

<groupId>org.elasticsearch.client</groupId>

<artifactId>transport</artifactId>

</dependency>

<groupId>org.elasticsearch.client</groupId>

<artifactId>x-pack-transport</artifactId>

</dependency>

ES5新增特性

引入新的字段类型Text/Keyword 来替换 String

keyword类型的数据只能完全匹配，适合那些不需要分词的数据，对过滤、聚合非常友好， text当然就是全文检索需要分词的字段类型了。

旧索引删除机制

旧节点包含旧的索引数据时，重新启用节点会加载旧的索引数据，es5.0会在集群状态信息里面保留500个删除的索引信息，所以如果发现这个索引是已经删除过的就会自动清理，不会再重复加进来

java client API

新的基于HTTP协议的客户端对Elasticsearch的依赖解耦，没有jar包冲突，提供了集群节点自动发现、日志处理、节点请求失败自动进行请求轮询，充分发挥Elasticsearch的高可用能力，并且性能不相上下。

ES执行计划profile

profile API提供有关搜索请求中各个组件的执行的详细计时信息。它使用户能够深入了解如何在较低级别执行搜索请求，以便用户可以理解为什么某些请求很慢，并采取措施来改进它们。

Lucene执行计划Explain

GET syltest/syltest/_search
{
  "profile": true, 
  "query": {
    "term": {
      "title":  "测试"
    }
  }
}

学习资源

学习资料

网页版“ElasticSearch权威指南”：https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html

PDF版“ElasticSearch权威指南”

官方的ES使用文档：https://www.elastic.co/guide/en/elasticsearch/reference/5.6/index.html

演示环境：head插件：http://*.*.11.57:9100; kibana插件： http://*.*.11.57:5601

原文链接：https://blog.csdn.net/weixin_40803329/article/details/101768290