Solr搜索失败的某些字符

Tom*_*Tom 12 solr

我有一个Solr集合,它不会返回一些非ASCII字符的结果.我们使用的例子是字符串S11. • “? «Ñaïvétý» ‘¢¥£’ ¶!#%; 搜索整个字符串不会返回任何结果,即使我在索引字段中有一个对象.但是,搜索该字符串的子字符串会返回匹配项.导致Solr没有返回匹配的唯一字符是中间的三个:• “?.该字段被编入索引,text_en但我也尝试过edge_ngram(希望有一点Cargo Cult魔法解决问题).这三个字符有什么特别之处,还是我需要调整Solr索引字段的方法?

我们正在通过django-haystack进行搜索,但问题也出现在Solr管理员中.

以下是两种字段类型定义:

<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory"        
            generateWordParts="1" generateNumberParts="1" catenateWords="0" 
            catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.EdgeNGramFilterFactory" 
            minGramSize="2" maxGramSize="50" side="front" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" 
            generateWordParts="1" generateNumberParts="1" catenateWords="0" 
            catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    </analyzer>
</fieldType>

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
            ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
</fieldType>
Run Code Online (Sandbox Code Playgroud)

Abh*_*tti 1

你能试试这个吗...

<fieldType name="text_reference" class="solr.TextField" sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="front"/>
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="back"/>
       </analyzer>
       <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
    </fieldType>
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述