Bad*_*adr 6 java lucene analyzer
我遇到了关于lucene termvector偏移的问题,当我使用我的自定义分析器分析一个字段时,它将给出termvector的无效偏移但是标准分析器没问题,这是我的分析器代码
public class AttachmentNameAnalyzer extends Analyzer {
private boolean stemmTokens;
private String name;
public AttachmentNameAnalyzer(boolean stemmTokens, String name) {
super();
this.stemmTokens = stemmTokens;
this.name = name;
}
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream stream = new AttachmentNameTokenizer(reader);
if (stemmTokens)
stream = new SnowballFilter(stream, name);
return stream;
}
@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
TokenStream stream = (TokenStream) getPreviousTokenStream();
if (stream == null) {
stream = new AttachmentNameTokenizer(reader);
if (stemmTokens)
stream = new SnowballFilter(stream, name);
setPreviousTokenStream(stream);
} else if (stream instanceof Tokenizer) {
( (Tokenizer) stream ).reset(reader);
}
return stream;
}
}
Run Code Online (Sandbox Code Playgroud)
什么是错误的"需要帮助"
分析器的问题是因为我之前发布了分析器的代码,实际上令牌流需要为要标记化的每个新文本条目进行休息。
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
TokenStream stream = (TokenStream) getPreviousTokenStream();
if (stream == null) {
stream = new AttachmentNameTokenizer(reader);
if (stemmTokens)
stream = new SnowballFilter(stream, name);
setPreviousTokenStream(stream); // ---------------> problem was here
} else if (stream instanceof Tokenizer) {
( (Tokenizer) stream ).reset(reader);
}
return stream;
}
Run Code Online (Sandbox Code Playgroud)
每次当我设置前一个令牌流时,下一个即将到来的文本字段都必须单独令牌化,它总是以最后一个令牌流的结束偏移量开始,这使得术语向量偏移量对于新流来说是错误的,现在它可以像这样正常工作
ublic TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
TokenStream stream = (TokenStream) getPreviousTokenStream();
if (stream == null) {
stream = new AttachmentNameTokenizer(reader);
if (stemmTokens)
stream = new SnowballFilter(stream, name);
} else if (stream instanceof Tokenizer) {
( (Tokenizer) stream ).reset(reader);
}
return stream;
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3195 次 |
| 最近记录: |