使用Lucene.net进行确切的短语搜索

Ash*_*Ash 11 lucene search lucene.net

我无法使用Lucene.NET 2.0.0.4搜索确切的短语

例如,我正在搜索"范围属性设置变量"(包括引号)但没有收到匹配项,我已经确认100%该短语存在.

任何人都可以建议我哪里出错了?这甚至是Lucene.NET支持的吗?像往常一样,API文档没有太大帮助,我读过的一些CodeProject文章并没有特别涉及到这一点.

使用以下代码创建索引:

Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", true);

Analyzer analyzer = new Lucene.Net.Analysis.SimpleAnalyzer();

IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer,true);

//create a document, add in a single field
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();

Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field(
    "content", File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.YES,
    Lucene.Net.Documents.Field.Index.TOKENIZED);

doc.Add(fldContent);

//write the document to the index
indexWriter.AddDocument(doc);
Run Code Online (Sandbox Code Playgroud)

然后我使用以下方法搜索短语:

//state the file location of the index
Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", false);

//create an index searcher that will perform the search
IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);

QueryParser qp = new QueryParser("content", new SimpleAnalyzer());

// txtSearch.Text  Contains a phrase such as "this is a phrase" 
Query q=qp.Parse(txtSearch.Text);  


//execute the query
Lucene.Net.Search.Hits hits = searcher.Search(q);
Run Code Online (Sandbox Code Playgroud)

目标文档大约是7 MB纯文本.

我已经看过上一个问题,但是我不想要接近搜索,只需要一个精确的短语搜索.

jos*_*sno 14

Shashikant Kore对他的回答是正确的,你需要启用任期位置......

但是,我建议不要在文档中存储文档的文本,除非您绝对需要它在搜索结果中返回给您...将存储设置为"否"可能有助于减少索引的大小.

Lucene.Net.Documents.Field fldContent = 
    new Lucene.Net.Documents.Field("content", 
        File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.NO,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);
Run Code Online (Sandbox Code Playgroud)


Sha*_*ore 13

您尚未启用术语位置.如下创建字段可以解决您的问题.

Lucene.Net.Documents.Field fldContent = 
    new Lucene.Net.Documents.Field("content", 
        File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.YES,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);
Run Code Online (Sandbox Code Playgroud)