我正在尝试使用Apache OpenNLP 1.7构建自定义NER.从可用的文档在这里,我开发了以下代码
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.Charset;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.NameSampleDataStream;
import opennlp.tools.namefind.TokenNameFinderFactory;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.TrainingParameters;
public class PersonClassifierTrainer {
        static String modelFile = "/opt/NLP/data/en-ner-customperson.bin";
        public static void main(String[] args) throws IOException {
            Charset charset = Charset.forName("UTF-8");
            **ObjectStream<String> lineStream = new PlainTextByLineStream(new FileInputStream("/opt/NLP/data/person.train"), charset);**
            ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
            TokenNameFinderModel model;
            TokenNameFinderFactory nameFinderFactory = null;
            try {
                model = NameFinderME.train("en", "person", sampleStream, TrainingParameters.defaultParams(),
                        nameFinderFactory);
            } finally {
                sampleStream.close();
            }
            BufferedOutputStream modelOut = null;
            try {
                modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
                model.serialize(modelOut);
            } finally {
                if (modelOut != null)
                    modelOut.close();
            }
        }
    }
上面突出显示的代码显示 - 'cast argument'file''to insputstreamfactory'
我被迫投了这个,因为它显示错误.
现在当我运行我的代码时,我收到以下错误
java.io.FileInputStream cannot be cast to opennlp.tools.util.InputStreamFactory
这里有什么遗漏的吗?
编辑1:Person.train文件包含此数据
<START:person> Hardik <END> is a software Professional.<START:person> Hardik works at company<END> and <START:person> is part of development team<END>. <START:person> Hardik<END> lives in New York
<START:person> Hardik<END> loves R statistical software
<START:person> Hardik<END> is a student at ISB
<START:person> Hardik<END> loves nature
Edit2:我现在得到空指针异常,有什么帮助吗?
你需要一个实例来InputStreamFactory检索你的InputStream.另外,TokenNameFinderFactory一定不能null.
public class PersonClassifierTrainer {
    static String modelFile = "/opt/NLP/data/en-ner-customperson.bin";
    public static void main(String[] args) throws IOException {
        InputStreamFactory isf = new InputStreamFactory() {
            public InputStream createInputStream() throws IOException {
                return new FileInputStream("/opt/NLP/data/person.train");
            }
        };
        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream = new PlainTextByLineStream(isf, charset);
        ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
        TokenNameFinderModel model;
        TokenNameFinderFactory nameFinderFactory = new TokenNameFinderFactory();
        try {
            model = NameFinderME.train("en", "person", sampleStream, TrainingParameters.defaultParams(),
                    nameFinderFactory);
        } finally {
            sampleStream.close();
        }
        BufferedOutputStream modelOut = null;
        try {
            modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
            model.serialize(modelOut);
        } finally {
            if (modelOut != null)
                modelOut.close();
        }
    }
}
编辑1:Person.train文件包含此数据
<START:person> Hardik <END> is a software Professional.<START:person> Hardik works at company<END> and <START:person> is part of development team<END>. <START:person> Hardik<END> lives in New York
<START:person> Hardik<END> loves R statistical software
<START:person> Hardik<END> is a student at ISB
<START:person> Hardik<END> loves nature
| 归档时间: | 
 | 
| 查看次数: | 1176 次 | 
| 最近记录: |