stanford nlp tokenizer

Question

stanford nlp tokenizer

如何使用stanford解析器对java类中的字符串进行标记？

我只能找到documentProcessor和PTBTokenizer从外部文件中获取文本的示例.

 DocumentPreprocessor dp = new DocumentPreprocessor("hello.txt");
   for (List sentence : dp) {
    System.out.println(sentence);
  }
  // option #2: By token

   PTBTokenizer ptbt = new PTBTokenizer(new FileReader("hello.txt"),
          new CoreLabelTokenFactory(), "");
  for (CoreLabel label; ptbt.hasNext(); ) {
    label = (CoreLabel) ptbt.next();
    System.out.println(label);
  }

Run Code Online (Sandbox Code Playgroud)

谢谢.

Answer 1

Cap*_*liC 6

PTBTokenizer构造函数接受java.io.Reader,然后您可以使用StringReader来解析文本

没关系,这给了我令牌:List <CoreLabel> rawWords = tokenizerFactory.getTokenizer(new StringReader(sentence)).tokenize(); 的System.out.println(rawWords.get(0).价值()); (4认同)

归档时间：	13 年，3 月前
查看次数：	5299 次
最近记录：	13 年，3 月前