stanford nlp tokenizer

Nav*_*een 6 tokenize stanford-nlp

如何使用stanford解析器对java类中的字符串进行标记?

我只能找到documentProcessor和PTBTokenizer从外部文件中获取文本的示例.

 DocumentPreprocessor dp = new DocumentPreprocessor("hello.txt");
   for (List sentence : dp) {
    System.out.println(sentence);
  }
  // option #2: By token

   PTBTokenizer ptbt = new PTBTokenizer(new FileReader("hello.txt"),
          new CoreLabelTokenFactory(), "");
  for (CoreLabel label; ptbt.hasNext(); ) {
    label = (CoreLabel) ptbt.next();
    System.out.println(label);
  }
Run Code Online (Sandbox Code Playgroud)

谢谢.

Cap*_*liC 6

PTBTokenizer构造函数接受java.io.Reader,然后您可以使用StringReader来解析文本

  • 没关系,这给了我令牌:List <CoreLabel> rawWords = tokenizerFactory.getTokenizer(new StringReader(sentence)).tokenize(); 的System.out.println(rawWords.get(0).价值()); (4认同)