ES
什么是ES?
ES是一个开源的高拓展分布式全文搜索引擎,是整个Elastic Stack的核心。
一、ElasticSearch HTTP操作
Elasticsearch 是面向文档型数据库,一条数据在这里就是一个文档。我们将 Elasticsearch 里存储文档数据和关系型数据库 MySQL 存储数据的概念进行一个类比。

倒排索引:inverted index 根据名字查id
1、索引操作
1)创建索引
对比关系型数据库,创建索引就等同于创建数据库
在 Postman 中,向 ES 服务器发 PUT 请求 :http://127.0.0.1:9200/shopping
2)查看所有索引
在 Postman 中,向 ES 服务器发 GET 请求 :http://127.0.0.1:9200/_cat/indices?v
3)查看单个索引
在 Postman 中,向 ES 服务器发 GET 请求 :http://127.0.0.1:9200/shopping
4) 删除索引
在 Postman 中,向 ES 服务器发 DELETE 请求 :http://127.0.0.1:9200/shopping
PUT是幂等性的,但是POST不是幂等性的[每次返回的id不一样]
2、文档操作
1)创建文档
索引已经创建好了,接下来我们来创建文档,并添加数据。这里的文档可以类比为关系型数 据库中的表数据,添加的数据格式为 JSON 格式。
在 Postman 中,向 ES 服务器发 POST 请求 :http://127.0.0.1:9200/shopping/_doc
POST http://127.0.0.1:9200/shopping/_doc/1001 ---> 固定返回ID
请求体内容为:
{ "title":"小米手机", "category":"小米", "images":"http://www.gulixueyuan.com/xm.jpg", "price":3999.00
}
2) 查询文档
GET http://127.0.0.1:9200/shopping/_doc/1001
查询所有:http://127.0.0.1:9200/shopping/_search 注意查询所有 400-错误 --> body不能有数据
3) 覆盖文档
PUT更新所有数据 http://127.0.0.1:9200/shopping/_doc/1001
{
"title":"华为手机",
"category":"小米",
"images":"http://www.gulixueyuan.com/xm.jpg",
"price":3999.00
}POST更新局部数据 http://127.0.0.1:9200/shopping/_update/1001
{
"doc" : {
"title":"华为"
}
}DELETE删除http://127.0.0.1:9200/shopping/_doc/1002
4) 条件查询
GET条件查询http://127.0.0.1:9200/shopping/_search?q=category:小米
或者 http://127.0.0.1:9200/shopping/_search
{
"query":{
"match":{
"category":"小米"
}
}
}5) 分页查询所有
from计算: (当前页-1)*每页条数
{
"query":{
"match_all":{
}
},
"from": 0,
"size":2
}6)查询全部索引的数据 / 并排序
{
"query":{
"match_all":{
}
},
"from": 0,
"size":2,
"_source":["title"]
}{
"query":{
"match_all":{
}
},
"from": 0,
"size":2,
"sort":{
"price":{
"order":"desc"
}
}
}7)多条件查询
must 必须两个条件同时成立
{
"query": {
"bool": {
"must": [
{
"match": {
"category":"小米"
}
},
{
"match": {
"price":"3999"
}
}
]
}
}
}should满足其一
{
"query": {
"bool": {
"should": [
{
"match": {
"category":"小米"
}
},
{
"match": {
"title":"华为"
}
}
]
}
}
}8)返回查询
{
"query": {
"bool": {
"should": [
{
"match": {
"category":"小米"
}
},
{
"match": {
"title":"华为"
}
}
],
"filter" : {
"range": {
"price":{
"gte":"3998"
}
}
}
}
}
}9)高亮查询
match 全文检索
match_phrase 完全匹配
{
"query": {
"match_phrase": {
"category":"小米"
}
},
"highlight": {
"fields": {
"category":{}
}
}
}10)聚合查询
{
"aggs": { //聚合操作
"price_group": { //随意起名
"terms" : { //分组
"field" : "price" //字段
}
}
},
//不要原始数据
"size" : 0
}11)映射关系
PUT http://127.0.0.1:9200/user/
PUT http://127.0.0.1:9200/user/_mapping
index:是否索引,默认为 true。 true:字段会被索引,则可以用来进行搜索 false:字段不会被索引,不能用来搜索
store:是否将数据进行独立存储,默认为 false。当然你也可以独立的存储某个字段,只要设置"store": true 即可,获取独立存储的字段要比从_source 中解析快得多,但是也会占用更多的空间,所以要根据实际业务需求来设置。
analyzer:分词器 ,后面会单独写博客说明
{
"properties": {
"name" : {
"type": "text",
"index":true
},
"sex" : {
"type":"keyword",
"index" : true
},
"tel" : {
"type" : "keyword",
"index" : false
}
}
}POSThttp://127.0.0.1:9200/user/_doc/1001
{
"name":"ansel",
"sex": "male",
"tel" : "1234"
}GET
{
"query": {
"match": {
"name": "a"
}
}
}二、JAVA API
<dependencies> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>7.8.0</version> </dependency> <!-- elasticsearch 的客户端 --> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.8.0</version> </dependency> <!-- elasticsearch 依赖 2.x 的 log4j --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>2.8.2</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>2.8.2</version> </dependency> <!-- 用于json转换 --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.9</version> </dependency> <!-- junit 单元测试 --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> </dependencies>
1.创建索引
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/15
@Description
*/
public class client_index_create {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient =
new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
//创建索引
CreateIndexRequest request = new CreateIndexRequest("hero");
// CreateIndexRequest request = new CreateIndexRequest("user");
CreateIndexResponse createIndexResponse =
esClient.indices().create(request, RequestOptions.DEFAULT);
System.out.println("响应操作 ===>" + createIndexResponse.isAcknowledged());
//2.关闭
esClient.close();
}
}2.查询索引
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/15
@Description
*/
public class client_index_search {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient =
new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
//创建索引
GetIndexRequest request = new GetIndexRequest("student");
GetIndexResponse getIndexResponse = esClient.indices().get(request, RequestOptions.DEFAULT);
//删除索引
DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("hero");
AcknowledgedResponse delete
= esClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
System.out.println(getIndexResponse.getAliases());
System.out.println(getIndexResponse.getMappings());
System.out.println(getIndexResponse.getSettings());
//2.关闭
esClient.close();
}
}3.插入数据
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.xcontent.XContentType;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/15
@Description
*/
public class
client_doc_insert {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient =
new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
//插入数据
IndexRequest request = new IndexRequest();
request.index("hero").id("1001");
Hero hero = new Hero();
hero.setAge(10);
hero.setName("Jack");
hero.setSex("male");
//向ES插入数据必须是JSON格式
ObjectMapper mapper = new ObjectMapper();
String str = mapper.writeValueAsString(hero);
request.source(str, XContentType.JSON);
IndexResponse response = esClient.index(request, RequestOptions.DEFAULT);
System.out.println(response.getResult());
//2.关闭
esClient.close();
}
}4.修改数据
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/15
@Description
*/
public class
client_doc_update {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient =
new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
//更改数据
UpdateRequest request = new UpdateRequest();
request.index("hero").id("1001");
request.doc(XContentType.JSON,"sex", "Female");
UpdateResponse response = esClient.update(request, RequestOptions.DEFAULT);
System.out.println("result = " + response.getResult());
//2.关闭
esClient.close();
}
}5.查询数据
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.document.DocumentField;
import java.io.IOException;
import java.util.List;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
public class Client_doc_Get {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient = new RestHighLevelClient
(RestClient.builder(new HttpHost("localhost", 9200, "http")));
//查询
GetRequest request = new GetRequest();
request.index("hero").id("1001");
GetResponse response = esClient.get(request, RequestOptions.DEFAULT);
System.out.println(response.getSourceAsString());
//关流
esClient.close();
}
}6.删除数据
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
public class Client_doc_Delete {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient = new RestHighLevelClient
(RestClient.builder(new HttpHost("localhost", 9200, "http")));
//查询
DeleteRequest request = new DeleteRequest();
request.index("hero").id("1002");
DeleteResponse response = esClient.delete(request, RequestOptions.DEFAULT);
System.out.println(response.toString());
//关流
esClient.close();
}
}7.批量添加
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
public class Client_doc_bulk {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient = new RestHighLevelClient
(RestClient.builder(new HttpHost("localhost", 9200, "http")));
BulkRequest request = new BulkRequest();
for (int i = 3; i < 7; i++) {
IndexRequest indexReq = new IndexRequest();
indexReq.index("hero").id("100" + i).source(XContentType.JSON,"name", "ID 100" + i);
request.add(indexReq);
}
BulkResponse response = esClient.bulk(request, RequestOptions.DEFAULT);
//花费时间
System.out.println(response.getTook());
System.out.println(response.getItems());
esClient.close();
}
}8.批量删除
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
public class Client_doc_bulk_delete {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient = new RestHighLevelClient
(RestClient.builder(new HttpHost("localhost", 9200, "http")));
BulkRequest request = new BulkRequest();
for (int i = 3; i < 7; i++) {
DeleteRequest deleteRequest = new DeleteRequest();
deleteRequest.index("hero").id("100" + i);
request.add(deleteRequest);
}
BulkResponse response = esClient.bulk(request, RequestOptions.DEFAULT);
//花费时间
System.out.println(response.getTook());
System.out.println(response.getItems());
//关流
esClient.close();
}
}9.全部查询
package com.ansel.esdemo.test;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import java.io.IOException;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
public class Client_doc_query {
public static void main(String[] args) throws IOException {
//1.创建客户端
RestHighLevelClient esClient =
new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
//查询索引中所有数据
SearchRequest request = new SearchRequest();
request.indices("hero", "student");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
request.source(searchSourceBuilder);
SearchResponse response = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = response.getHits();
System.out.println(response.getTook());
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
esClient.close();
}
}10.条件查询
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("age", 6));
request.source(searchSourceBuilder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
System.out.println(hit.getSourceAsString());
}11.分页查询
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchSourceBuilder.from(0);
searchSourceBuilder.size(2);
request.source(searchSourceBuilder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
System.out.println(hit.getSourceAsString());
}排序 searchSourceBuilder.sort("age", SortOrder.DESC);
12.过滤字段
//过滤字段
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchAllQuery());
//包含 排除字段
String[] excludes = {};
String[] includes = {"name"};
builder.fetchSource(includes,excludes);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
System.out.println(hit.getSourceAsString());
}13.must & should 多条件查询
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.matchQuery("age", 6));
boolQueryBuilder.must(QueryBuilders.matchQuery("sex", "male"));
boolQueryBuilder.mustNot(QueryBuilders.matchQuery("sex", "female"));
builder.query(boolQueryBuilder);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
System.out.println(hit.getSourceAsString());
}14.范围查询
//范围查询
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("age");
rangeQueryBuilder.gte(4);
rangeQueryBuilder.lte(7);
builder.query(rangeQueryBuilder);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
String result = hit.getSourceAsString();
System.out.println(result);
}15.模糊查询
wildcardQuery
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilders.wildcardQuery("name", "A");
request.source(searchSourceBuilder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
System.out.println(hit.getSourceAsString());
}SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
FuzzyQueryBuilder fuzzyBuilder = QueryBuilders.fuzzyQuery("name", "Ansel").fuzziness(Fuzziness.AUTO);
builder.query(fuzzyBuilder);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
String result = hit.getSourceAsString();
System.out.println("=============================================================");
System.out.println(result);
System.out.println("=============================================================");
}16.高亮查询
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("sex", "male");
builder.query(termsQueryBuilder);
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.preTags("<font color='red'>");
highlightBuilder.postTags("</font>");
builder.highlighter(highlightBuilder);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
System.out.println(hit.getSourceAsString());
}
esClient.close();
}三、Springboot集成
依赖
<properties> <java.version>8</java.version> //记得改 <elasticsearch.version>7.3.0</elasticsearch.version> </properties> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency>
Config
package com.ansel.esdemo.config;/**
* @author Ansel Zhong
* coding time
*/
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.bind.annotation.ResponseBody;
/**
@title es-demo
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
@Configuration
public class EsClientConfig {
@Bean
public RestHighLevelClient restHighLevelClient(){
RestHighLevelClient esClient = new RestHighLevelClient
(RestClient.builder(new HttpHost("localhost", 9200, "http")));
return esClient;
}
}四、爬虫
<!-- 解析网页--> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.2</version> </dependency>
package com.ansel.utils;/**
* @author Ansel Zhong
* coding time
*/
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
/**
@title search-practicec
@author Ansel Zhong
@Date 2023/3/16
@Description
*/
public class HtmlParseUtils {
public static void main(String[] args) throws IOException {
//获取请求
String url = "https://search.jd.com/Search?keyword=java&enc=utf-8&wq=java&pvid=ef80dbba286c45908d43cd70bed881c3";
//解析网页
Document document = Jsoup.parse(new URL(url), 30000);
Element element = document.getElementById("J_goodsList");
//System.out.println(element.html());
Elements elements = element.getElementsByTag("li");
//获取元素中的内容
for (Element el : elements) {
//关于图片多的网页 都是懒加载
//所以获取data-lazy-img
String img = el.getElementsByTag("img").eq(0)
.attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0)
.text();
String name = el.getElementsByClass("p-name")
.eq(0)
.text();
System.out.println("=================================");
System.out.println(img);
System.out.println(price);
System.out.println(name);
}
}
}public static List<Content> parseJD(String keywords) throws IOException {
ArrayList<Content> contents = new ArrayList<>();
//获取请求
String url = "https://search.jd.com/Search?keyword=java" + keywords;
//解析网页
Document document = Jsoup.parse(new URL(url), 30000);
Element element = document.getElementById("J_goodsList");
//System.out.println(element.html());
Elements elements = element.getElementsByTag("li");
//获取元素中的内容
for (Element el : elements) {
//关于图片多的网页 都是懒加载
//所以获取data-lazy-img
String img = el.getElementsByTag("img").eq(0)
.attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0)
.text();
String name = el.getElementsByClass("p-name")
.eq(0)
.text();
System.out.println("=================================");
System.out.println(img);
System.out.println(price);
System.out.println(name);
Content content = new Content();
content.setName(name);
content.setSrc(img);
content.setPrice(price);
contents.add(content);
}
return contents;
}Service
@Autowired
private RestHighLevelClient esClient;
@Override
public Boolean parseContent(String keywords) throws IOException {
List<Content> contents = HtmlParseUtils.parseJD(keywords);
BulkRequest request = new BulkRequest();
request.timeout(TimeValue.timeValueSeconds(1));
int a = 1015;
contents.stream()
.forEach((Content content) -> {
IndexRequest indexRequest = new IndexRequest();
indexRequest.index("jd_goods");
indexRequest.source(JSON.toJSONString(content), XContentType.JSON);
request.add(indexRequest);
}
);
BulkResponse resp = esClient.bulk(request, RequestOptions.DEFAULT);
return resp.hasFailures();
}
@Override
public List<Map<String, Object>> Search(String keywords, int pageNo, int pageSize) throws IOException {
Map<String, Object> map = new HashMap<>();
ArrayList<Map<String, Object>> list = new ArrayList<>();
//模糊查询
SearchRequest request = new SearchRequest();
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.wildcardQuery("name",keywords));
//分页
builder.from(pageNo);
builder.size(pageSize);
//查询
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
map = hit.getSourceAsMap();
list.add(map);
}
return list;
}ERROR & NOTE
1.Elastic-search-head连接不上 --> 跨域问题
启动head npm run start
elastic-search / config/ yml 文件加
http.cors.enabled: true http.cors.allow-origin: "*"
2.TOMCAT启动错误
可能是这里的问题,替换:
<!-- 用于json转换 --> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <exclusions> <exclusion> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> <version>2.9.8</version> </dependency>
3.JSON转换
也可以用
<dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.47</version> </dependency>
目录
1.Elastic-search-head连接不上 --> 跨域问题