Jit*_*hin 5 lucene search solr tokenize
我创建了一个自定义标记过滤器,用于连接流中的所有标记.这是我的incrementToken()
功能
public boolean incrementToken() throws IOException {
if (finished) {
logger.debug("Finished");
return false;
}
logger.debug("Starting");
StringBuilder buffer = new StringBuilder();
int length = 0;
while (input.incrementToken()) {
if (0 == length) {
buffer.append(termAtt);
length += termAtt.length();
} else {
buffer.append(" ").append(termAtt);
length += termAtt.length() + 1;
}
}
termAtt.setEmpty().append(buffer);
//offsetAtt.setOffset(0, length);
finished = true;
return true;
}
Run Code Online (Sandbox Code Playgroud)
我将新的Filter添加到字段的索引和查询分析链的末尾,并从http:// localhost:8983/solr/admin/analysis.jsp测试过滤器似乎正在工作.过滤器连接流中的标记.但是在重新索引文档时,只有我的第一个文档被编入索引.
这就是我的过滤器链的样子.
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " />
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" />
<filter class="org.custom.solr.analysis.ConcatFilterFactory" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[-_]" replacement=" " />
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopWordFilterFactory" ignoreCase="true" words="words.txt" />
<filter class="org.custom.solr.analysis.ConcatFilterFactory" />
</analyzer>
Run Code Online (Sandbox Code Playgroud)
没有ConcatFilterFactory
所有单词正确索引,但ConcatFilterFactory
只有第一个文档被索引.我究竟做错了什么?请帮助我理解这个问题.
更新:
终于弄明白了这个问题.
if (finished) {
logger.debug("Finished");
finished = false;
return false;
}
Run Code Online (Sandbox Code Playgroud)
看起来同一个类正在被重用.说得通.
您应该为您的过滤器编写一个单元测试。即使您的分析有效,它也应该失败。显然你忘记在返回 false 之前添加这一行:
finished = false;
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
481 次 |
最近记录: |