JMT*_*ler 6 lucene solr relevance solr-boost
我希望我的搜索结果按照他们正在进行的分数排序,但分数计算不正确.这就是说,不一定是不正确的,但与预期不同,我不知道为什么.我的目标是删除任何改变分数的内容.
如果我执行匹配两个对象的搜索(其中ObjectA的分数高于ObjectB),则首先返回ObjectB.
让我们说,对于这个例子,我的查询是一个单词:"apples".
ObjectA的标题:"苹果是苹果"(2/3条款)
ObjectA的描述:"苹果 - 苹果中有苹果,现在苹果遍布苹果的所有苹果!" (6/18条款)
ObjectB的标题:"苹果很棒"(1/3条款)
ObjectB的描述:"苹果室里有苹果,现在苹果在苹果上都变坏了!" (4/18条款)
标题字段没有提升(或者更确切地说,提升为1),描述字段的提升为0.8.我没有通过solrconfig.xml或通过我正在通过的查询指定文档提升.如果有另一种指定文档提升的方法,那么我有可能错过一个.
在分析explain打印输出之后,看起来ObjectA 正在计算比ObjectB更高的分数,就像我想要的那样,除了一个区别:ObjectB的标题fieldNorm总是高于ObjectA.
以下是explain打印输出.您知道:标题字段是mditem5_tns,描述字段是mditem7_tns:
ObjectB:
1.3327172 = (MATCH) sum of:
1.0352166 = (MATCH) max plus 0.1 times others of:
0.9766194 = (MATCH) weight(mditem5_tns:appl in 0), product of:
0.53929156 = queryWeight(mditem5_tns:appl), product of:
1.8109303 = idf(docFreq=3, maxDocs=9)
0.2977981 = queryNorm
1.8109303 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of:
1.0 = tf(termFreq(mditem5_tns:appl)=1)
1.8109303 = idf(docFreq=3, maxDocs=9)
1.0 = fieldNorm(field=mditem5_tns, doc=0)
0.58597165 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of:
0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of:
0.8 = boost
1.8109303 = idf(docFreq=3, maxDocs=9)
0.2977981 = queryNorm
1.3581977 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of:
2.0 = tf(termFreq(mditem7_tns:appl)=4)
1.8109303 = idf(docFreq=3, maxDocs=9)
0.375 = fieldNorm(field=mditem7_tns, doc=0)
0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of:
0.999001 = 1000.0/(1.0*float(1)+1000.0)
1.0 = boost
0.2977981 = queryNorm
ObjectA:
1.2324848 = (MATCH) sum of:
0.93498427 = (MATCH) max plus 0.1 times others of:
0.8632177 = (MATCH) weight(mditem5_tns:appl in 0), product of:
0.53929156 = queryWeight(mditem5_tns:appl), product of:
1.8109303 = idf(docFreq=3, maxDocs=9)
0.2977981 = queryNorm
1.6006513 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of:
1.4142135 = tf(termFreq(mditem5_tns:appl)=2)
1.8109303 = idf(docFreq=3, maxDocs=9)
0.625 = fieldNorm(field=mditem5_tns, doc=0)
0.7176658 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of:
0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of:
0.8 = boost
1.8109303 = idf(docFreq=3, maxDocs=9)
0.2977981 = queryNorm
1.6634457 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of:
2.4494898 = tf(termFreq(mditem7_tns:appl)=6)
1.8109303 = idf(docFreq=3, maxDocs=9)
0.375 = fieldNorm(field=mditem7_tns, doc=0)
0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of:
0.999001 = 1000.0/(1.0*float(1)+1000.0)
1.0 = boost
0.2977981 = queryNorm
Run Code Online (Sandbox Code Playgroud)
这个问题是由词干器造成的.它将"apple is apples"扩展为"apple appl are apples appl",从而使该领域更长.由于文件B仅包含由词干提取器扩展的1个术语,因此字段保持比文档A短.
这导致不同的fieldNorms.
| 归档时间: |
|
| 查看次数: |
4637 次 |
| 最近记录: |