相关疑难解决方法(0)

Corpus参数上的DocumentTermMatrix错误

我有以下代码:

# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

news_corpus <- Corpus(VectorSource(news_raw$text)) # a column of strings.

corpus_clean <- tm_map(news_corpus, tolower)
corpus_clean <- tm_map(corpus_clean, removeNumbers)
corpus_clean <- tm_map(corpus_clean, removeWords, stopwords('english'))
corpus_clean <- tm_map(corpus_clean, removePunctuation)
corpus_clean <- tm_map(corpus_clean, stripWhitespace)
corpus_clean <- tm_map(corpus_clean, trim)

news_dtm <- DocumentTermMatrix(corpus_clean) # errors here
Run Code Online (Sandbox Code Playgroud)

当我运行该DocumentTermMatrix()方法时,它给了我这个错误:

错误:inherits(doc,"TextDocument")不为TRUE

为什么我会收到此错误?我的行不是文本文件吗?

这是检查时的输出corpus_clean:

[[153]]
[1] obama holds technical school model us

[[154]]
[1] oil boom produces jobs bonanza archaeologists

[[155]] …
Run Code Online (Sandbox Code Playgroud)

r corpus tm

56
推荐指数
3
解决办法
5万
查看次数

R-Project没有适用于'meta'的方法应用于类"character"的对象

我正在尝试运行此代码(Ubuntu 12.04,R 3.1.1)

# Load requisite packages
library(tm)
library(ggplot2)
library(lsa)

# Place Enron email snippets into a single vector.
text <- c(
  "To Mr. Ken Lay, I’m writing to urge you to donate the millions of dollars you made from selling Enron stock before the company declared bankruptcy.",
  "while you netted well over a $100 million, many of Enron's employees were financially devastated when the company declared bankruptcy and their retirement plans were wiped out",
  "you sold $101 million worth …
Run Code Online (Sandbox Code Playgroud)

r text-mining tm

32
推荐指数
2
解决办法
4万
查看次数

标签 统计

r ×2

tm ×2

corpus ×1

text-mining ×1