Lucene 4.2 StringField

Question

Lucene 4.2 StringField

我是Lucene的新手.我有两个文档,我希望与文档字段"关键字"完全匹配(该字段可能在文档中多次出现).

第一个文档包含关键字"Annotation is cool".第二个文档包含关键字"注释很酷".当我搜索"注释很酷"时,如何构建查询以便只找到第一个文档？

我读了一些关于"StringField"的内容并且没有标记化.如果我在方法"addDoc"中将"关键字"字段从"TextField"更改为"StringField",则无法找到任何内容.

这是我的代码:

private IndexWriter writer;

public void lucene() throws IOException, ParseException {
    // Build the index
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_42);
    Directory index = new RAMDirectory();
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_42,
            analyzer);
    this.writer = new IndexWriter(index, config);

    // Add documents to the index
    addDoc("Spring", new String[] { "Java", "JSP",
            "Annotation is cool" });
    addDoc("Java", new String[] { "Oracle", "Annotation is cool too" });

    writer.close();

    // Search the index
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);

    BooleanQuery qry = new BooleanQuery();

    qry.add(new TermQuery(new Term("keyword", "\"Annotation is cool\"")), BooleanClause.Occur.MUST);

    System.out.println(qry.toString());

    Query q = new QueryParser(Version.LUCENE_42, "title", analyzer).parse(qry.toString());

    int hitsPerPage = 10;
    TopScoreDocCollector collector = TopScoreDocCollector.create(
            hitsPerPage, true);

    searcher.search(q, collector);

    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    for (int i = 0; i < hits.length; ++i) {
        int docId = hits[i].doc;
        Document doc = searcher.doc(docId);
        System.out.println((i + 1) + ". \t" + doc.get("title"));
    }

    reader.close();
}

private void addDoc(String title, String[] keywords) throws IOException {
    // Create new document
    Document doc = new Document();

    // Add title
    doc.add(new TextField("title", title, Field.Store.YES));

    // Add keywords
    for (int i = 0; i < keywords.length; i++) {
        doc.add(new TextField("keyword", keywords[i], Field.Store.YES));
    }

    // Add document to index
    this.writer.addDocument(doc);
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

fem*_*gon 6

问题不在于您如何索引字段.字符串字段是将整个输入索引为单个标记的正确方法.问题是你如何搜索.我真的不知道你打算用这个逻辑完成什么.

BooleanQuery qry = new BooleanQuery();
qry.add(new TermQuery(new Term("keyword", "\"Annotation is cool\"")), BooleanClause.Occur.MUST);
//Great! You have a termQuery added to the parent BooleanQuery which should find your keyword just fine!

Query q = new QueryParser(Version.LUCENE_42, "title", analyzer).parse(qry.toString());
//Now all bets are off.

Run Code Online (Sandbox Code Playgroud)

Query.toString()是一种方便的调试方法,但是假设通过QueryParser运行输出文本查询将重新生成相同的查询是不安全的.标准查询解析器实际上没有太多能力将多个单词表达为单个术语.我相信你看到的这个字符串版本看起来像:

keyword:"Annotation is cool"

Run Code Online (Sandbox Code Playgroud)

这将被解释为PhraseQuery.PhraseQuery会查找三个连续的术语,Annotation,is,很酷,但是你对它进行索引的方式,你只有一个术语"Annotation很酷".

解决方案是不要使用逻辑

 Query nuttyQuery = queryParser.parse(perfectlyGoodQuery.toString());
 searcher.search(nuttyQuery);

Run Code Online (Sandbox Code Playgroud)

相反,只需使用您已创建的BooleanQuery进行搜索.

 searcher.search(perfectlyGoodQuery);

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，6 月前
查看次数：	5011 次
最近记录：	11 年，7 月前