R regex中的Mallet错误:java.lang.NoSuchMethodException:给定参数没有合适的方法

jxn*_*jxn 3 regex r rjava mallet topic-modeling

我一直在关注如何在R中使用mallet创建主题模型的教程.我的文本文件每行有1个句子.它看起来像这样,有大约50个句子.

Thank you again and have a good day :).
This is an apple.
This is awesome!
LOL!
i need 2.
.
.
. 
Run Code Online (Sandbox Code Playgroud)

这是我的代码:

Sys.setenv(NOAWT=TRUE)

#setup the workspace
# Set working directory
dir<-"/Users/jxn"
Dir <- "~/Desktop/Chat/malletR/text" # adjust to suit
require(mallet)
documents1 <- mallet.read.dir(Dir)
View(documents1)
stoplist1<-mallet.read.dir("~/Desktop/Chat/malletR/stoplists")
View(stoplist1)
**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**
Run Code Online (Sandbox Code Playgroud)

除了代码的最后一行,一切都有效

**`**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**`**
Run Code Online (Sandbox Code Playgroud)

我一直收到这个错误:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.lang.NoSuchMethodException: No suitable method for the given parameters
Run Code Online (Sandbox Code Playgroud)

根据包,这是函数应该是如何:

mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
                    token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")
Run Code Online (Sandbox Code Playgroud)

我相信它与token.regexp参数有关,因为
documents1 <- mallet.read.dir(Dir)工作正常,这意味着提供给mallet.instances的前3个参数是正确的.

这是我遵循教程的git repo的链接. https://github.com/shawngraham/R/blob/master/topicmodel.R

任何帮助将非常感激.

谢谢,J

lit*_*ger 6

我怀疑问题出在您的文本文件中.我遇到了同样的错误并通过使用as.character()如下函数解决了它:

mallet.instances <- mallet.import(as.character(documents$id), as.character(documents$text), "en.txt", FALSE, token.regexp="\\p{L}[\\p{L}\\p{P}]+\\p{L}")