狮身人面像搜索:charset表困难

DS_*_*per 2 php sphinx character utf-8

我现在已经两天失去了理智......

我想在斯芬克斯搜索中使用斯洛文尼亚字母,所有英文字母+čžš(以防万一)

我正在网上寻找合适的字符,但我发现蹲下......

所以我决定自己一步一步......

这是我的索引

index classifieds
{
    source          = classifieds_src
    path            = c:\Sphinx\data\classifieds
    docinfo         = extern

    min_infix_len       = 2
    infix_fields        = title,keywords,summary,text
    expand_keywords     = 1
    enable_star     = 1


    charset_type        = utf-8
    charset_table = 0..9, a..z, _, A..Z->a..z,-, U+002C, \
    U+010C->U+010D, U+0106->U+0107, U+0160->U+0161, U+017D->U+017E, \
    U+010D->c,U+0107->c, U+0161->s, U+017E->z, \
    U+010D, U+0107, U+0161, U+017E
}
Run Code Online (Sandbox Code Playgroud)

我将大Č,ĆŠŽ映射到他们的小写对应物,并添加了从č到c,ć到c,š到s和ž到z的映射,最后我将这四个字符添加到表格中....

这些是我的分类广告:

t1:HPUSBoptičnamiškazaprenosnik RH304 t2:ČiškaPCplusMO-U033 + F2(optična,brezžična,PS/2)t3:MiškaLogitechoptičnaNanoM235 siva

db编码:utf8_general_ci表的编码:utf8_general_ci标题字段编码:utf8_general_ci

测试用例:

$testcase = array(
        "miška",
        "mi*ka",
        "?iška",
        "?iška",
        "miska",
        "usb prenosnik",
        "prenosnik miska",
        "miška usb"
);

//api settings:

$this->sphinx->SetArrayResult(true);
$this->sphinx->setLimits(0, 100);
$this->sphinx->setMatchMode(SPH_MATCH_EXTENDED2);
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE, '@weight DESC');
$this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
$this->sphinx->SetFieldWeights(array("title"=>100, "keywords"=>80, "summary"=>60,
"text"=>20, "slug"=>10));
Run Code Online (Sandbox Code Playgroud)

最后测试结果:

关键字(total/total_found)字样

miška     (0/0)

Array
(
    [*miška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

mi*ka     (0/0)

Array
(
    [*mi*] => Array
        (
            [docs] => 3
            [hits] => 4
        )

    [mi] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [*2aka*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [2aka] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

?iška     (0/0)

Array
(
    [*?iška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [?iška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

?iška     (0/0)

Array
(
    [*?iška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [?iška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

miska     (0/0)

Array
(
    [*miska*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miska] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

usb prenosnik     (1/1)

Array
(
    [*usb*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [usb] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [*prenosnik*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [prenosnik] => Array
        (
            [docs] => 1
            [hits] => 1
        )

)

prenosnik miska     (0/0)

Array
(
    [*prenosnik*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [prenosnik] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [*miska*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miska] => Array
        (
            [docs] => 0
            [hits] => 0
        )

)

miška usb     (0/0)

Array
(
    [*miška*] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [miška] => Array
        (
            [docs] => 0
            [hits] => 0
        )

    [*usb*] => Array
        (
            [docs] => 1
            [hits] => 1
        )

    [usb] => Array
        (
            [docs] => 1
            [hits] => 1
        )

)
Run Code Online (Sandbox Code Playgroud)

你可以清楚地看到我只在没有斯洛文尼亚特殊字符的查询中得到肯定的结果

拜托,请帮助我对此感到失望

use*_*291 10

问题是sphinx索引器默认不使用utf8字符集.通过在sphinx.conf中添加以下内容来修复

sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8
Run Code Online (Sandbox Code Playgroud)

参考