带有连字符和停用词的Solr(太阳黑子)查询

Pau*_*rey 2 solr ruby-on-rails sunspot sunspot-rails sunspot-solr

我正在使用Solr 1.4.1的Ruby Sunspot gem

我有一个关于用连字符搜索的问题.

当我搜索"foo bar bla"时,会返回预期的结果.

当连字符包含在搜索词中时,如"foo - bar bla",则不会返回结果.

我已经在我的停用词列表中添加了连字符,并在过去几天以多种方式调整了我的schema.xml文件,但无济于事.

对于那些暴露于太阳黑子的人,我的最小单词匹配设置为3,这与在solrconfig.xml文件中设置相同的mm配置相同,例如:3

这就是我的schema.xml文件的相关部分的外观.

    <!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="string" class="solr.StrField" tokenized="true" omitNorms="true" sortMissingLast="true">
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
  </analyzer>
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" />
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
  </analyzer>
</fieldType>

<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
  </analyzer>
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" />
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
  </analyzer>
</fieldType>
Run Code Online (Sandbox Code Playgroud)

任何帮助或建议将受到高度赞赏.

谢谢,

Dav*_*ber 6

连字符( - )是一个Solr运算符,用于排除与运算符后面的单词匹配的结果.我不认为在停用词列表中添加连字符会影响到这一点.我建议在通过Solr运行查询之前删除连字符.我的猜测是,连字符的结果是排除与"bar"匹配的文档?也许您可以尝试分析结果,看看实际情况是否如此.