R，tm转换错误-丢弃文档

Question

R，tm转换错误-丢弃文档

Jul*_*lie 7 r extract keyword extraction tm

我想根据文字中关键字的权重创建一个网络。然后在运行与tm_map相关的代码时出现错误：

library (tm)
library(NLP)
lirary (openNLP)

text = c('.......')
corp <- Corpus(VectorSource(text))
corp <- tm_map(corp, stripWhitespace)

Warning message:
In tm_map.SimpleCorpus(corp, stripWhitespace) :
transformation drops documents

corp <- tm_map(corp, tolower)

Warning message:
In tm_map.SimpleCorpus(corp, tolower) : transformation drops documents

Run Code Online (Sandbox Code Playgroud)

这些代码已经在2个月前开始工作，现在我正在尝试获取新数据，但现在不再工作。有人请告诉我我哪里错了。谢谢。我什至尝试使用下面的命令，但是它也不起作用。

corp <- tm_map(corp, content_transformer(stripWhitespace))

Run Code Online (Sandbox Code Playgroud)

Answer 1

phi*_*ver 9

该代码应该仍然有效。您得到警告，而不是错误。当您使用语料库而不是VCorpus时，只有结合使用基于VectorSource的语料库时，才会出现此警告。

原因是在基础代码中进行了检查，以查看语料库内容的名称数量是否与语料库内容的长度匹配。将文本作为矢量读取时，没有文档名称，并且会弹出此警告。这只是一个警告，没有文档被丢弃。

看到两个例子之间的区别

library(tm)

text <- c("this is my text with some other text and some more")

# warning based on Corpus and Vectorsource
text_corpus <- Corpus(VectorSource(text))

# warning appears running following line
tm_map(text_corpus, content_transformer(tolower))
<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1
Warning message:
In tm_map.SimpleCorpus(text_corpus, content_transformer(tolower)) :
  transformation drops documents

# Using VCorpus
text_corpus <- VCorpus(VectorSource(text))

# warning doesn't appear
tm_map(text_corpus, content_transformer(tolower))
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 1
tm_map(text_corpus, content_transformer(tolower))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，8 月前
查看次数：	7512 次
最近记录：	7 年，8 月前