Geo*_*dis 8 sql sql-server full-text-search containstable isabout
我试图弄清楚加密术语在SQL SERVER中的ISABOUT查询中的确切工作方式.这是我目前所处的位置:
每个查询都返回以下行:
QUERY 1(重量1): 初始排名
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 249
272619 156
221557 114
Run Code Online (Sandbox Code Playgroud)
QUERY 2(权重0.8): 排名增加,保留初始订单
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.8) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 321
272619 201
221557 146
Run Code Online (Sandbox Code Playgroud)
QUERY 3(权重0.2): 排名增加,保留初始订单
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.2) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 998
272619 877
221557 692
Run Code Online (Sandbox Code Playgroud)
QUERY 4(权重0.17): 排名降低,最佳匹配现在是最后一次,这些术语的倒置行为从0.17开始
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
272619 960
221557 958
306342 802
Run Code Online (Sandbox Code Playgroud)
QUERY 5(体重0.16): 排名上升,最佳匹配现在排名第二
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
272619 978
306342 935
221557 841
Run Code Online (Sandbox Code Playgroud)
QUERY 6(重量0.01): 排名降低,最佳匹配再次持续
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.01) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
221557 105
272619 77
306342 50
Run Code Online (Sandbox Code Playgroud)
权重1的最佳匹配具有249的等级,而权重下降到最佳匹配的0.2等级增加到998.从0.2到0.17排名减少并且从0.16结果反转(重现此行为的权重值取决于术语和可能在列搜索...)
似乎有一个重量意味着相反的点,就像"不包括这个术语".
你对这种行为有什么解释吗?
为什么在体重下降时排名上升?
为什么排名会在某个点之后下降,直到结果被反转为止,您如何预测这一点?
当用户搜索创建以下查询的内容时,我使用自定义"断字符":
CONTAINSTABLE(documentParts, title,
'ISABOUT (
"wordA wordB wordC" weight (0.8),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6),
"wordA*" weight (0.1),
"wordB*" weight (0.1),
"wordC*" weight (0.1),
) ')
Run Code Online (Sandbox Code Playgroud)
我期待0.1字的大排名吗?
以下查询是否与上述相同,我是否期望0.1排名有一些奇怪的行为?
CONTAINSTABLE(documentParts, title, '
ISABOUT ( "wordA wordB wordC" weight (0.8) ),
OR ISABOUT ( "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6) ),
OR ISABOUT ( "wordA*" weight (0.1) ),
OR ISABOUT ( "wordB*" weight (0.1) ),
OR ISABOUT ( "wordC*" weight (0.1) ),
')
Run Code Online (Sandbox Code Playgroud)
编辑:
我发现这个主题:http://msdn.microsoft.com/en-us/library/ms142524(v = sql.105).aspx
,它回答了我的一些问题,但创造了一些新的!
我在两个表中搜索"文档"和"documentParts",并使用union all来对行进行求和并得到我的结果.根据这篇文章,这是错误的,因为索引行计算到计算排名所以RANK将像添加苹果和胡萝卜...
我现在的解决方案是计算每个CONTAINSTABLE的百分比,如下所示:
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 249
272619 156
221557 114
Run Code Online (Sandbox Code Playgroud)
并总结一下......
根据我的经验,当权重加起来为 1 时,我得到了最好的结果。
CONTAINSTABLE(documentParts, content,
'ISABOUT (
"wordA wordB wordC" weight (0.5),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.2),
"wordA*" weight (0.1),
"wordB*" weight (0.1),
"wordC*" weight (0.1)
) ')
Run Code Online (Sandbox Code Playgroud)
由于时间在滴答作响,我最终得到了这样的结果,取得了相当好的结果......:
SELECT [KEY], SUM([RANK]) AS [RANK] FROM (
SELECT [KEY], ([RANK]*1)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK]
FROM CONTAINSTABLE(documentParts, content,
'ISABOUT (
"wordA wordB wordC" weight (0.8),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6),
"wordA*" weight (0.4),
"wordB*" weight (0.4),
"wordC*" weight (0.4)
) ') c
WHERE c.RANK>0
UNION ALL
SELECT [KEY], ([RANK]*2)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK]
FROM CONTAINSTABLE(documents, title,
'ISABOUT (
"wordA wordB wordC" weight (0.8),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6),
"wordA*" weight (0.4),
"wordB*" weight (0.4),
"wordC*" weight (0.4)
) ') c
WHERE c.RANK>0
) t
GROUP BY [KEY]
ORDER BY [RANK] DESC
Run Code Online (Sandbox Code Playgroud)
我会将其传递给测试团队,然后就到此为止......