我使用 devtools,更新到 R 3.3.1 后,每次通过 Github 安装某些东西时都会收到此消息。
Skipping install of 'PACKAGE' from a github remote, the SHA1 (123456) has not changed since last install.
Use `force = TRUE` to force installation
Run Code Online (Sandbox Code Playgroud)
有人也遇到过这个问题吗?
我想/需要创建一个1和0的矩阵,其中包含有关常用术语的信息.我在列之间创建了一个常用术语矩阵(例如,像1,4,2这样的行),但我不知道如何对它进行分解.
这是一个玩具和可重复的例子.步骤(1) - (4)是好的,步骤(5)是我现在不能做的.
(1)我有这个(虚构的)数据集
vec1 <- c("apple","pear","apple and pear")
vec2 <- c("apple and pear","banana","orange")
vec3 <- c("orange and pear","banana","apple")
my.data.frame <- as.data.frame(cbind(vec1,vec2,vec3))
vec1 vec2 vec3
1 apple apple and pear orange and pear
2 pear banana banana
3 apple and pear orange apple
Run Code Online (Sandbox Code Playgroud)
(2)我提取变量和内容
vectors.list <- as.vector(colnames(my.data.frame))
list.of.fruits <- unique(as.vector(unlist(my.data.frame)))
Run Code Online (Sandbox Code Playgroud)
(2)我写下一个函数来计算常用术语(这是这篇文章的变形:如何计算常用词并将结果存储在矩阵中?)
common.fruits <- function(vList) {
v <- lapply(vList, tolower)
do.call(rbind, lapply(v, function(x) {
do.call(c, lapply(v, function(y) length(intersect(x, y))))
}))
}
Run Code Online (Sandbox Code Playgroud)
(4)我使用get和lapply做一些有效的(我猜)计算
compare <- lapply(vectors.list,get)
common.terms.matrix <- common.fruits(compare) …Run Code Online (Sandbox Code Playgroud) 从上一篇文章,带有RcppArmadillo的大型SpMat对象,我决定Rcpp用来计算一个大矩阵(~600,000行x 11个字符串)
我已经安装Rcpp和RcppArmadillo
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppArmadillo_0.7.500.0.0 Rcpp_0.12.7 cluster_2.0.4 skmeans_0.2-8
[5] ggdendro_0.1-20 ggplot2_2.1.0 lsa_0.73.1 SnowballC_0.5.1
[9] data.table_1.9.6 jsonlite_1.1 purrr_0.2.2 stringi_1.1.2
[13] dplyr_0.5.0 plyr_1.8.4
loaded via a namespace (and not attached):
[1] assertthat_0.1 slam_0.1-38 MASS_7.3-45 chron_2.3-47 grid_3.3.1 R6_2.2.0 gtable_0.2.0
[8] DBI_0.5-1 magrittr_1.5 scales_0.4.0 …Run Code Online (Sandbox Code Playgroud) 我想计算数据帧中的零。
计算我正在使用的 NA
mtcars %>% group_by(cyl) %>% summarise_each(funs(sum(is.na(.))))
Run Code Online (Sandbox Code Playgroud)
返回
# A tibble: 3 × 11
cyl mpg disp hp drat wt qsec vs am gear carb
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 4 0 0 0 0 0 0 0 0 0 0
2 6 0 0 0 0 0 0 0 0 0 0
3 8 0 0 0 0 0 0 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
我怎么能做类似的事情
mtcars %>% group_by(cyl) %>% summarise_each(funs(sum(identical(.,0)))
Run Code Online (Sandbox Code Playgroud)
达到相同的结果但计数零而不是 …
我要删除最后一个值为零的所有行以及所有最后一个值为零的列。
这是我的数据集的一个虚拟(可复制)示例:
library(dplyr)
x = c("apples" ,1,0,1,2)
y = c("bananas",0,0,0,0)
z = c("apples" ,2,0,4,6)
t = c("rowsum" ,3,0,5,8)
my_table = rbind(x,y,z,t)
colnames(my_table) = c("product","day1","day2","day3","colsum")
my_table = as.tbl(as.data.frame(my_table)) %>%
mutate(day1 = as.integer(as.character(day1)),
day2 = as.integer(as.character(day2)),
day3 = as.integer(as.character(day3)),
colsum = as.integer(as.character(colsum)))
Run Code Online (Sandbox Code Playgroud)
虚拟示例具有以下输出:
> my_table
# A tibble: 4 × 5
product day1 day2 day3 colsum
<fctr> <int> <int> <int> <int>
1 apples 1 0 1 2
2 bananas 0 0 0 0
3 apples 2 0 4 6
4 rowsum 3 …Run Code Online (Sandbox Code Playgroud) 我正在尝试修复一些小问题。我感兴趣的列包含在开头或结尾处带有空格和双倍空格的字符串。
我问之前看过这些帖子
这是我正在做的可复制示例
library(dplyr)
mtcars2 = tbl_df(mtcars) %>%
mutate(name = rownames(mtcars)) %>%
mutate(name = gsub("^ *|(?<= ) | *$", "", name, perl = TRUE)) %>%
mutate(name = gsub("^\\s+|\\s+$", "", name)) %>%
mutate(name = iconv(name, from = "", to = "ASCII//TRANSLIT", sub = ""))
head(mtcars2, 3)
Run Code Online (Sandbox Code Playgroud)
结果是
# A tibble: 3 × 12
mpg cyl disp hp drat wt qsec vs am gear carb name
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> …Run Code Online (Sandbox Code Playgroud)