solr 中 text_general 和 text_en 之间的区别？

Question

solr 中 text_general 和 text_en 之间的区别？

eug*_*ene 5 indexing solr textfield

我发现我可以为不同的语言text_general领域使用不同的标记器/分析器。
但也存在text_en。

为什么我们需要两个？

假设我们有一个亚洲语言的句子，并且该句子还包含一些英语单词。
text_general用于句子中的亚洲词和text_en英语词？
solr 如何索引/查询这样的句子？

Answer 1

Jes*_*ose 5

text_en 使用词干，因此如果您搜索fakes，您可以匹配fake,fake's等faking。非词干字段fakes将仅匹配fakes。

每个领域都使用不同的分析器“链”。text_en 使用一系列过滤器可以更好地索引英语。请参阅tokenizer和filter条目。

text_general 的架构摘录：

<!-- A general text field that has reasonable, generic
     cross-language defaults: it tokenizes with StandardTokenizer,
 removes stop words from case-insensitive "stopwords.txt"

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.LowerCaseFilterFactory"/>

Run Code Online (Sandbox Code Playgroud)

text_en 的架构摘录：

<!-- A text field with defaults appropriate for English: it
     tokenizes with StandardTokenizer, removes English stop words
     (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and
     finally applies Porter's stemming.  The query time analyzer
     also applies synonyms from synonyms.txt. -->
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>

Run Code Online (Sandbox Code Playgroud)

Answer 2

aru*_*run 2

Why do we need two?

这样你就可以对不同的内容进行不同的分析。或者，如果您愿意，您甚至可以以不同的方式分析相同的内容（使用 copyField ）。这为您在查询时提供了关于要查询哪个字段的更多选择。

text_general is used for the asian words in the sentence and text_en for english words?

不可以，每个字段只能有一个fieldType，就像数据库一样。

如果您想对同一领域内的不同语言进行不同的分析，那么您可以查看SmartChineseAnalyzer的示例。

另请参阅http://docs.lucidworks.com/display/LWEUG/Multilingual+Indexing+and+Search

归档时间：	12 年，6 月前
查看次数：	9933 次
最近记录：	12 年，6 月前