小编sgt*_*per的帖子

如何在R中增加绘制区域wordclouds的大小

试图复制这里的例子;

http://onertipaday.blogspot.com/2011/07/word-cloud-in-r.html

需要帮助搞清楚如何增加单词云的绘制区域.更改png("wordcloud_packages.png",width = 1280,height = 800)中的高度和宽度参数只会更改画布的高度和宽度.但绘制的区域仍然很小.

require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
u = "http://cran.r-project.org/web/packages/available_packages_by_date.html"
t = readHTMLTable(u)[[1]]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(t[,3]))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, tolower)
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_packages.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=3,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()
Run Code Online (Sandbox Code Playgroud)

r tag-cloud text-mining word-cloud

13
推荐指数
1
解决办法
2万
查看次数

如何在R中找到相似的句子/短语?

例如,我有数十亿个短语,我想要它们的类似群集.

> strings.to.cluster <- c("Best Toyota dealer in bay area. Drive out with a new car today",
                        "Largest Selection of Furniture. Stock updated everyday" , 
                        " Unique selection of Handcrafted Jewelry",
                        "Free Shipping for orders above $60. Offer Expires soon",
                        "XXXX is where smart men buy anniversary gifts",
                        "2012 Camrys on Sale. 0% APR for select customers",
                        "Closing Sale on office desks. All Items must go" 
                         )
Run Code Online (Sandbox Code Playgroud)

假设这个向量是数十万行.R中是否有一个包来按意义聚类这些短语?或者是否有人建议通过对给定短语的含义对"相似"短语进行排名的方法.

statistics nlp r

7
推荐指数
1
解决办法
5033
查看次数

python pandas安装问题

我试过安装

  1. 来源(python setup.py install进入提取的焦油球目录)
  2. 运用 pip
  3. 使用easy_install但似乎什么都没有用......我已经下载并升级了xcode,安装了命令行工具..

我克隆了pandas的github存储库

cd ../pandas
python setup.py install 
running install
running bdist_egg
running egg_info
writing requirements to pandas.egg-info/requires.txt
writing pandas.egg-info/PKG-INFO
writing top-level names to pandas.egg-info/top_level.txt
writing dependency_links to pandas.egg-info/dependency_links.txt
reading manifest file 'pandas.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'setupegg.py'
no previously-included directories found matching 'doc/build'
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
warning: no …
Run Code Online (Sandbox Code Playgroud)

python homebrew easy-install pandas

2
推荐指数
1
解决办法
1万
查看次数