我有以下数据框架
data.frame(a = c(1,2,3),b = c(1,2,3))
a b
1 1 1
2 2 2
3 3 3
Run Code Online (Sandbox Code Playgroud)
我想把它变成
a b
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 3 3
7 1 1
8 2 2
9 3 3
Run Code Online (Sandbox Code Playgroud)
或重复N次.在R中有一个简单的功能吗?谢谢!
mds*_*ner 110
编辑:更新到更好的现代R答案.
你可以使用replicate(),然后rbind将结果重新组合在一起.rownames会自动更改为从1:nrows运行.
d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))
Run Code Online (Sandbox Code Playgroud)
更传统的方法是使用索引,但这里的rowname更改不是那么整洁(但更有用):
d[rep(seq_len(nrow(d)), n), ]
Run Code Online (Sandbox Code Playgroud)
以下是对上面的改进,前两个使用purrr函数式编程,惯用的purrr:
purrr::map_dfr(seq_len(3), ~d)
Run Code Online (Sandbox Code Playgroud)
并且较少惯用的purrr(相同的结果,虽然更尴尬):
purrr::map_dfr(seq_len(3), function(x) d)
Run Code Online (Sandbox Code Playgroud)
最后通过索引而不是列表应用使用dplyr:
d %>% slice(rep(row_number(), 3))
Run Code Online (Sandbox Code Playgroud)
Max*_*nis 28
对于data.frame对象,这个解决方案比@mdsummer和@ wojciech-sobala快几倍.
d[rep(seq_len(nrow(d)), n), ]
Run Code Online (Sandbox Code Playgroud)
对于data.table对象,@ mdsummer比转换后应用上述内容要快一些data.frame.对于大n,这可能会翻转.
.
完整代码:
packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark")
lapply(packages, require, character.only=T)
Repeat1 <- function(d, n) {
return(do.call("rbind", replicate(n, d, simplify = FALSE)))
}
Repeat2 <- function(d, n) {
return(Reduce(rbind, list(d)[rep(1L, times=n)]))
}
Repeat3 <- function(d, n) {
if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)])
return(d[rep(seq_len(nrow(d)), n), ])
}
Repeat3.dt.convert <- function(d, n) {
if ("data.table" %in% class(d)) d <- as.data.frame(d)
return(d[rep(seq_len(nrow(d)), n), ])
}
# Try with data.frames
mtcars1 <- Repeat1(mtcars, 3)
mtcars2 <- Repeat2(mtcars, 3)
mtcars3 <- Repeat3(mtcars, 3)
checkEquals(mtcars1, mtcars2)
# Only difference is row.names having ".k" suffix instead of "k" from 1 & 2
checkEquals(mtcars1, mtcars3)
# Works with data.tables too
mtcars.dt <- data.table(mtcars)
mtcars.dt1 <- Repeat1(mtcars.dt, 3)
mtcars.dt2 <- Repeat2(mtcars.dt, 3)
mtcars.dt3 <- Repeat3(mtcars.dt, 3)
# No row.names mismatch since data.tables don't have row.names
checkEquals(mtcars.dt1, mtcars.dt2)
checkEquals(mtcars.dt1, mtcars.dt3)
# Time test
res <- microbenchmark(Repeat1(mtcars, 10),
Repeat2(mtcars, 10),
Repeat3(mtcars, 10),
Repeat1(mtcars.dt, 10),
Repeat2(mtcars.dt, 10),
Repeat3(mtcars.dt, 10),
Repeat3.dt.convert(mtcars.dt, 10))
print(res)
ggsave("repeat_microbenchmark.png", autoplot(res))
Run Code Online (Sandbox Code Playgroud)
Sti*_*ibu 14
该软件包dplyr包含bind_rows()直接组合列表中所有数据框的功能,因此无需do.call()与rbind()以下内容一起使用:
df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3))
library(dplyr)
bind_rows(replicate(3, df, simplify = FALSE))
Run Code Online (Sandbox Code Playgroud)
对于大量重复bind_rows()也比以下快得多rbind():
library(microbenchmark)
microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)),
bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)),
times = 20)
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112 20 b
## bind_rows 1.765956 1.818087 1.881697 1.86207 1.898839 2.321621 20 a
Run Code Online (Sandbox Code Playgroud)
Jaa*_*aap 12
使用data.table 包,您可以将特殊符号.I与rep:
df <- data.frame(a = c(1,2,3), b = c(1,2,3))
dt <- as.data.table(df)
n <- 3
dt[rep(dt[, .I], n)]
Run Code Online (Sandbox Code Playgroud)
这使:
Run Code Online (Sandbox Code Playgroud)a b 1: 1 1 2: 2 2 3: 3 3 4: 1 1 5: 2 2 6: 3 3 7: 1 1 8: 2 2 9: 3 3
d <- data.frame(a = c(1,2,3),b = c(1,2,3))
r <- Reduce(rbind, list(d)[rep(1L, times=3L)])
Run Code Online (Sandbox Code Playgroud)