che*_*kha 0 java lucene filter analyzer
我正在尝试在我的自定义分析器中的TokenStream上应用多个过滤器.以下是代码:
public class CustomizeAnalyzer extends Analyzer {
//code omitted
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);
TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);
filter = new StopFilter(Version.LUCENE_44, filter, stopWords);
return new TokenStreamComponents(source, new PorterStemFilter(source));
}
}
Run Code Online (Sandbox Code Playgroud)
但是,不会使用LowerCaseFilter.我真的按照这里的文档.有人可以解释一下如何让它发挥作用吗?
非常感谢,
你的问题在最后一行.你创建了一个过滤器链,然后通过传回来在return语句中将其短路new PorterStemFilter(source),这是一个直接位于标记器上的干式过滤器,而不是链中较早的过滤器.这应该是:
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);
TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);
filter = new StopFilter(Version.LUCENE_44, filter, stopWords);
filter = new PorterStemFilter(filter);
return new TokenStreamComponents(source, filter);
}
Run Code Online (Sandbox Code Playgroud)