Lucene连接查询(JoinQuery)

在传统关系型数据库中我们经常会使用连接查询(join),而在Lucene中也提供了类似的功能。下面我们来看看如何在Lucene中如何使用连接查询以及与数据库连接查询之间的区别。

1.创建相关索引

    final String indexDir = "/Users/chenfeihao/Desktop/lucene/index7";
    final String idField = "id";
    final String toField = "productId";

    //@Test
    public void createIndex() throws Exception{
        Directory dir = FSDirectory.open(Paths.get(indexDir));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        IndexWriter w = new IndexWriter(dir, config);

        // 0
        Document doc = new Document();
        doc.add(new TextField("description", "random text", Field.Store.YES));
        doc.add(new TextField("name", "name1", Field.Store.YES));
        doc.add(new TextField(idField, "1", Field.Store.YES));
        doc.add(new SortedDocValuesField(idField, new BytesRef("1")));

        w.addDocument(doc);

        // 1
        Document doc1 = new Document();
        doc1.add(new TextField("price", "10.0", Field.Store.YES));
        doc1.add(new TextField(idField, "2", Field.Store.YES));
        doc1.add(new SortedDocValuesField(idField, new BytesRef("2")));
        doc1.add(new TextField(toField, "1", Field.Store.YES));
        doc1.add(new SortedDocValuesField(toField, new BytesRef("1")));

        w.addDocument(doc1);

        // 2
        Document doc2 = new Document();
        doc2.add(new TextField("price", "20.0", Field.Store.YES));
        doc2.add(new TextField(idField, "3", Field.Store.YES));
        doc2.add(new SortedDocValuesField(idField, new BytesRef("3")));
        doc2.add(new TextField(toField, "1", Field.Store.YES));
        doc2.add(new SortedDocValuesField(toField, new BytesRef("1")));

        w.addDocument(doc2);

        // 3
        Document doc3 = new Document();
        doc3.add(new TextField("description", "more random text", Field.Store.YES));
        doc3.add(new TextField("name", "name2", Field.Store.YES));
        doc3.add(new TextField(idField, "4", Field.Store.YES));
        doc3.add(new SortedDocValuesField(idField, new BytesRef("4")));

        w.addDocument(doc3);


        // 4
        Document doc4 = new Document();
        doc4.add(new TextField("price", "10.0", Field.Store.YES));
        doc4.add(new TextField(idField, "5", Field.Store.YES));
        doc4.add(new SortedDocValuesField(idField, new BytesRef("5")));
        doc4.add(new TextField(toField, "4", Field.Store.YES));
        doc4.add(new SortedDocValuesField(toField, new BytesRef("4")));
        w.addDocument(doc4);

        // 5
        Document doc5 = new Document();
        doc5.add(new TextField("price", "20.0", Field.Store.YES));
        doc5.add(new TextField(idField, "6", Field.Store.YES));
        doc5.add(new SortedDocValuesField(idField, new BytesRef("6")));
        doc5.add(new TextField(toField, "4", Field.Store.YES));
        doc5.add(new SortedDocValuesField(toField, new BytesRef("4")));
        w.addDocument(doc5);

        //6
        Document doc6 = new Document();
        doc6.add(new TextField(toField, "4", Field.Store.YES));
        doc6.add(new SortedDocValuesField(toField, new BytesRef("4")));
        w.addDocument(doc6);
        w.commit();
        w.close();
        dir.close();
    }

可以看到我们这里选择的fromField为id,而选择的toField为productId(当然也可以互换from与to,视实际查询情况而定,比如下面查询示例的第三个就是使用productId作为fromField连接查询)


2.使用JoinQuery进行连接查询

/**
     * 测试join查询
     * @throws Exception
     */
    @Test
    public void testJoinSearch() throws Exception{
        Directory dir = FSDirectory.open(Paths.get(indexDir));
        IndexReader reader = DirectoryReader.open(dir);
        IndexSearcher searcher = new IndexSearcher(reader);

        //使用JoinQuery进行连接查询
        //multipleValuesPerDocument:指示每个文档是否有多个from索引
        //scoreMode:指示如何将fromQuery中的得分映射到返回的查询
        Query joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name2")), searcher, ScoreMode.None);
        TopDocs docs = searcher.search(joinQuery, 10);
        System.out.println("查询到文档数:"+docs.totalHits);
        for (ScoreDoc scoreDoc : docs.scoreDocs){
            Document doc = searcher.doc(scoreDoc.doc);
            System.out.println(idField+":"+doc.get(idField));
            System.out.println(toField+":"+doc.get(toField));
        }

        joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name1")), searcher, ScoreMode.None);
        docs = searcher.search(joinQuery, 10);
        System.out.println("查询到的文档数:"+docs.totalHits);
        for (ScoreDoc scoreDoc : docs.scoreDocs){
            Document doc = searcher.doc(scoreDoc.doc);
            System.out.println(idField+":"+doc.get(idField));
            System.out.println(toField+":"+doc.get(toField));
        }

        // 根据商品连接查询offer,查询出的结果为所有id为4的offer
        joinQuery = JoinUtil.createJoinQuery(toField, false, idField, new TermQuery(new Term("id", "5")), searcher, ScoreMode.None);
        docs = searcher.search(joinQuery, 10);
        System.out.println("查询到的匹配数据:"+docs.totalHits);
        for (ScoreDoc scoreDoc : docs.scoreDocs){
            Document doc = searcher.doc(scoreDoc.doc);
            System.out.println(idField+":"+doc.get(idField));
            System.out.println(toField+":"+doc.get(toField));
        }

        reader.close();
        dir.close();
    }

一些细节在注释中已经说明,下面来分析一下查询结果

查询到文档数:3
id:5
productId:4
id:6
productId:4
id:null
productId:4
查询到的文档数:2
id:2
productId:1
id:3
productId:1
查询到的匹配数据:1
id:4
productId:null

根据打印结果可以看出连接查询先根据Term的条件查询到所有符合的文档,然后提取文档中我们指定的fromField(这里是idField),根据fromField的值寻找匹配的toField的值,根据toField的值返回匹配的文档
以第一个查询为例,根据Term我们去寻找name索引值为name2的文档,找到的文档的fromField值为4,则根据连接条件toField值也为4,于是查询转为查询所有toField索引(这里指定为了productId)值为4的文档
可以看出与关系型数据库连接查询不同的是,数据不会真正进行连接,第一次查询到的数据只会给接下来的连接查询提供索引上的限制(连接查询的toField的取值必须与第一次查询的fromField的取值相同)


版权声明:本文为m0_37556444原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。