Lucene - 如何在给定父文档ID的情况下获取父块中的所有子文档

Question

Lucene - 如何在给定父文档ID的情况下获取父块中的所有子文档

我直接使用 Lucene（没有 Solr 或 ElasticSearch）来索引一组遵循父子层次结构的文档。

我使用“块”来完成此操作，方法是将所有子级和父级添加到同一个块中，调用：

writer.addDocuments(childrenAndParentDocList)

Run Code Online (Sandbox Code Playgroud)

我正在对所有父级和子级进行自由文本搜索（在子级搜索中使用 ToParentBlockJoinQuery 链接到父级文档），这将返回一组很好的父级文档，这些文档要么与查询匹配，要么具有匹配的子级查询。

我需要做的下一件事是获取我拥有的所有父文档的所有子文档。

我在这里看到了 lucene 测试中的一个方法，它展示了如何在给定子文档的情况下获取父文档。

private Document getParentDoc(IndexReader reader, BitSetProducer parents, int childDocID) throws IOException { final List<LeafReaderContext> leaves = reader.leaves(); final int subIndex = ReaderUtil.subIndex(childDocID, leaves); final LeafReaderContext leaf = leaves.get(subIndex); final BitSet bits = parents.getBitSet(leaf); return leaf.reader().document(bits.nextSetBit(childDocID - leaf.docBase)); }
Run Code Online (Sandbox Code Playgroud)
但我不确定如何做相反的事情。即如何获取给定父文档的所有子项。

任何意见，将不胜感激。

Answer 1

Ben*_*Ben 5

我最终使用了下面的代码。它似乎有效：

private List<Integer> getChildDocIds(IndexSearcher indexSearcher, int parentDocId) throws IOException {
    //Use a query in QueryBitSetProducer constructor which identifies parent docs
    BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
    IndexReader indexReader = indexSearcher.getIndexReader();
    List<LeafReaderContext> leaves = indexReader.leaves();
    int subIndex = ReaderUtil.subIndex(parentDocId, leaves);
    LeafReaderContext leaf = leaves.get(subIndex);
    int localParentDocId = parentDocId - leaf.docBase;
    List<Integer> childDocs = new ArrayList<>();
    if (localParentDocId == 0) { 
        //not a parent, or parent has no children
        return childDocs;
    }
    int prevParent = parentsFilter.getBitSet(leaf).prevSetBit(localParentDocId - 1);
    for(int childDocIndex = prevParent + 1; childDocIndex < localParentDocId; childDocIndex++) {
        childDocs.add(leaf.docBase + childDocIndex);
    }
    return childDocs;
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，9 月前
查看次数：	899 次
最近记录：	5 年，6 月前