Search for terms in the index which are a prefix of the search term or vice versa (!)

Question

Search for terms in the index which are a prefix of the search term or vice versa (!)

I would like for Lucene to find a document containing a term "bahnhofstr" if I search for "bahnhofstrasse", i.e., I don't only want to find documents containing terms of which my search term is a prefix but also documents that contain terms that are themselves a prefix of my search term...

How would I go about this?

Answer 1

fem*_*gon 0

我认为模糊查询可能对您最有帮助。这将根据与您的查询的编辑距离对术语进行评分。如果没有指定最小相似度，它将有效地匹配每个可用的术语。这可能会降低其性能，但确实可以满足您的需求。

模糊查询由 ~ 字符表示，例如：

firstname:bahnhofstr~

Run Code Online (Sandbox Code Playgroud)

或者具有最小相似度（0 到 1 之间的数字，0 是最宽松的，没有最小值）

firstname:bahnhofstr~0.4

Run Code Online (Sandbox Code Playgroud)

或者，如果您正在构建自己的查询，请使用FuzzyQuery

这并不完全符合您指定的内容，但这是接近的最简单方法。

至于您到底在寻找什么，我不知道有一个简单的 Lucene 调用可以完成它。我可能只是将术语拆分为一系列术语查询，您可以在查询字符串中表示如下内容：

firstname:b
firstname:ba
firstname:bah
firstname:bahn
firstname:bahnh
firstname:bahnho
firstname:bahnhof
firstname:bahnhofs
firstname:bahnhofst
firstname:bahnhofstr*

Run Code Online (Sandbox Code Playgroud)

顺便说一句，我实际上不会自己生成查询字符串。我只是自己构建 TermQuery 和 PrefixQuery 对象。

评分会有点扭曲，我可能会更高地提升更长的查询以获得更好的排序，但这是我想到的方法，可以相当轻松地准确完成您正在寻找的内容。DisjunctionMaxQuery将帮助您将类似的内容与其他术语一起使用并获得更合理的评分。

希望模糊查询对您来说效果很好。似乎是一个更好的解决方案。

如果您非常需要这种性质的查询，另一种选择可能是在索引时将字段标记为 n-gram（请参阅NGramTokenizer），这将允许您有效地使用 NGramPhraseQuery来实现您想要的结果。

归档时间：	13 年，1 月前
查看次数：	153 次
最近记录：	12 年，10 月前