nok*_*nal 6 .net c# lucene lucene.net
还有一个极端新手的另一个Lucene.net问题.
这一次,我发现了使用包含范围和使用突出显示的查询的一个有趣问题.
我是从内存中写的,所以请原谅任何语法错误.
我有一个假设的Lucene索引:
---------------------------------------------------------
| date | text |
---------------------------------------------------------
| 1317809124 | a crazy block of text |
---------------------------------------------------------
| 1317809284 | programmers are crazy |
---------------------------------------------------------
** date is a unix timestamp
Run Code Online (Sandbox Code Playgroud)
......并且它们已通过以下方式添加到索引中:
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Lucene.Net.Documents.Field("text", "some block of text", Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.Add(new Lucene.Net.Documents.Field("date", "some unix timestamp", Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED));
Run Code Online (Sandbox Code Playgroud)
这就是我查询Lucene的方式:
Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(_headlinesDirectory), true);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
Lucene.Net.Search.Query query = parser.Parse(queryPhrase);
Lucene.Net.Search.Hits hits = searcher.Search(query);
// code highlighting
Lucene.Net.Highlight.Formatter formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter("<span style=\"background:yellow;\">","</span>");
Lucene.Net.Highlight.SimpleFragmenter fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(50);
Lucene.Net.Highlight.QueryScorer scorer = new Lucene.Net.Highlight.QueryScorer(query);
Lucene.Net.Highlight.Highlighter highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer);
highlighter.SetTextFragmenter(fragmenter);
for (int i = 0; i < hits.Length(); i++)
{
Lucene.Net.Documents.Document doc = hits.Doc(i);
Lucene.Net.Analysis.TokenStream stream = analyzer.TokenStream("", new StringReader(doc.Get("text")));
string highlightedText = highlighter.GetBestFragments(stream, doc.Get("text"), 1, "...");
Console.WriteLine("--> " + highlightedText);
}
Run Code Online (Sandbox Code Playgroud)
这是我的查询示例:
crazy AND date:[1286273266 TO 32503680000]
Run Code Online (Sandbox Code Playgroud)
查询时,它会查找"疯狂"的所有结果,但不会输出任何突出显示的文本.
删除日期范围后,您只需查询该术语:
crazy
Run Code Online (Sandbox Code Playgroud)
...这次突出显示正常.
在我的实现中是否存在我做错的事情,我是否应该查看新的实现,或者这是一个可能解决的已知问题.
谢谢你提前stackeroverflow'ers :)
- 编辑 -
我已经实施了LB的建议(太棒了!).我仍然不知道为什么这会起作用,因为我认为Lucene是完整的伏都教或编程巫术,但确实如此,我很高兴:).
为了完整性,这里是修改后的代码:
Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(_headlinesDirectory), true);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
// new line here
parser.SetMultiTermRewriteMethod(Lucene.Net.Search.MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Lucene.Net.Search.Query query = parser.Parse(queryPhrase);
// new line here
Lucene.Net.Search.Query query2 = query.Rewrite(searcher.GetIndexReader());
Lucene.Net.Search.Hits hits = searcher.Search(query);
// code highlighting
Lucene.Net.Highlight.Formatter formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter("<span style=\"background:yellow;\">","</span>");
Lucene.Net.Highlight.SimpleFragmenter fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(50);
// changed to use query2
Lucene.Net.Highlight.QueryScorer scorer = new Lucene.Net.Highlight.QueryScorer(query2);
Lucene.Net.Highlight.Highlighter highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer);
highlighter.SetTextFragmenter(fragmenter);
for (int i = 0; i < hits.Length(); i++)
{
Lucene.Net.Documents.Document doc = hits.Doc(i);
Lucene.Net.Analysis.TokenStream stream = analyzer.TokenStream("", new StringReader(doc.Get("text")));
string highlightedText = highlighter.GetBestFragments(stream, doc.Get("text"), 1, "...");
Console.WriteLine("--> " + highlightedText);
}
Run Code Online (Sandbox Code Playgroud)
如果可以,请告诉我是否已准确实施建议.
首先调用 QueryParser 的
SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
方法,然后创建一个新查询
Query newQuery = query.Rewrite(indexReader);
Run Code Online (Sandbox Code Playgroud)
现在您可以使用“newQuery”进行搜索。
| 归档时间: |
|
| 查看次数: |
6113 次 |
| 最近记录: |