我制作了一个示例数据框.我尝试从Projects列创建一个wordcloud.
Hours<-c(2,3,4,2,1,1,3)
Project<-c("a","b","b","a","c","c","c")
Period<-c("2014-11-22","2014-11-23","2014-11-24","2014-11-22", "2014-11-23", "2014-11-23", "2014-11-24")
cd=data.frame(Project,Hours,Period)
Run Code Online (Sandbox Code Playgroud)
这是我的代码:
cd$Project<-as.character(cd$Project)
wordcloud(cd$Project,min.freq=1)
Run Code Online (Sandbox Code Playgroud)
但是我收到以下错误:
Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
In addition: Warning messages:
1: In max(freq) : no non-missing arguments to max; returning -Inf
2: In max(freq) : no non-missing arguments to max; returning -Inf
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
我想你错过了这个freq论点.您想要创建一个列,指示每个项目发生的频率.因此,我count在dplyr包中使用了转换数据.
library(dplyr)
library(wordcloud)
cd <- data.frame(Hours = c(2,3,4,2,1,1,3),
Project = c("a","b","b","a","c","c","c"),
Period = c("2014-11-22","2014-11-23","2014-11-24",
"2014-11-22", "2014-11-23", "2014-11-23",
"2014-11-24"),
stringsAsFactors = FALSE)
cd2 <- count(cd, Project)
# Project n
#1 a 2
#2 b 2
#3 c 3
wordcloud(words = cd2$Project, freq = cd2$n, min.freq = 1)
Run Code Online (Sandbox Code Playgroud)

如果指定了字符列,则该函数会在后台为您创建语料库和文档术语矩阵.问题是来自pacakge的TermDocumentMatrix函数的默认行为tm是仅跟踪超过三个字符的单词(同样,它会删除"停用单词",因此将删除"a"之类的值).因此,如果您将样本更改为
Project<-c("aaa","bbb","bbb","aaa","ccc","ccc","ccc")
Run Code Online (Sandbox Code Playgroud)
它会工作得很好.看来没有办法改变发送到TermDocumentMatrix的控制选项.如果你想以与默认wordcloud函数相同的方式自己计算频率,你可以这样做
corpus <- Corpus(VectorSource(cd$Project))
corpus <- tm_map(corpus, removePunctuation)
# corpus <- tm_map(corpus, function(x) removeWords(x, stopwords()))
tdm <-TermDocumentMatrix(corpus, control=list(wordLengths=c(1,Inf)))
freq <- slam::row_sums(tdm)
words <- names(freq)
wordcloud(words, freq, min.freq=1)
Run Code Online (Sandbox Code Playgroud)
但是,对于简单的情况,您可以只计算频率 table()
tbl <- table(cd$Project)
wordcloud(names(tbl), tbl, min.freq=1)
Run Code Online (Sandbox Code Playgroud)