我有一个数据框列表,如下所示:
ls[[1]]
[[1]]
month year oracle
1 2004 356.0000
2 2004 390.0000
3 2004 394.4286
4 2004 391.8571
ls[[2]]
[[2]]
month year microsoft
1 2004 339.0000
2 2004 357.7143
3 2004 347.1429
4 2004 333.2857
Run Code Online (Sandbox Code Playgroud)
如何创建如下所示的单个数据框:
month year oracle microsoft
1 2004 356.0000 339.0000
2 2004 390.0000 357.7143
3 2004 394.4286 347.1429
4 2004 391.8571 333.2857
Run Code Online (Sandbox Code Playgroud)
我们也可以使用 Reduce
Reduce(function(...) merge(..., by = c('month', 'year')), lst)
Run Code Online (Sandbox Code Playgroud)
使用@ Jaap的示例,如果值不相同,请使用all=TRUE选项from merge.
Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
# month year oracle microsoft google
#1 1 2004 356.0000 NA NA
#2 2 2004 390.0000 339.0000 NA
#3 3 2004 394.4286 357.7143 390.0000
#4 4 2004 391.8571 347.1429 391.8571
#5 5 2004 NA 333.2857 357.7143
#6 6 2004 NA NA 333.2857
Run Code Online (Sandbox Code Playgroud)
如果每个数据帧的和列的值相同,则使用@akrun 的答案中的Reduce/代码会非常有效。但是,当它们不相同时(本答案末尾的示例数据)mergemonthyear
Reduce(function(...) merge(..., by = c('month', 'year')), ls)
Run Code Online (Sandbox Code Playgroud)
将仅返回每个数据帧中常见的行:
month year oracle microsoft google
1 3 2004 394.4286 357.7143 390.0000
2 4 2004 391.8571 347.1429 391.8571
Run Code Online (Sandbox Code Playgroud)
在这种情况下,当您想要包含所有行/观察结果时,您可以使用all=TRUE(如 @akrun 所示)或使用 full_join包中的dplyr替代方案:
library(dplyr)
Reduce(function(...) full_join(..., by = c('month', 'year')), ls)
# or just:
Reduce(full_join, ls)
Run Code Online (Sandbox Code Playgroud)
这将导致:
month year oracle microsoft google
1 1 2004 356.0000 NA NA
2 2 2004 390.0000 339.0000 NA
3 3 2004 394.4286 357.7143 390.0000
4 4 2004 391.8571 347.1429 391.8571
5 5 2004 NA 333.2857 357.7143
6 6 2004 NA NA 333.2857
Run Code Online (Sandbox Code Playgroud)
使用数据:
ls <- list(structure(list(month = 1:4, year = c(2004L, 2004L, 2004L, 2004L), oracle = c(356, 390, 394.4286, 391.8571)), .Names = c("month", "year", "oracle"), class = "data.frame", row.names = c(NA, -4L)),
structure(list(month = 2:5, year = c(2004L, 2004L, 2004L, 2004L), microsoft = c(339, 357.7143, 347.1429, 333.2857)), .Names = c("month", "year", "microsoft"), class = "data.frame", row.names = c(NA,-4L)),
structure(list(month = 3:6, year = c(2004L, 2004L, 2004L, 2004L), google = c(390, 391.8571, 357.7143, 333.2857)), .Names = c("month", "year", "google"), class = "data.frame", row.names = c(NA,-4L)))
Run Code Online (Sandbox Code Playgroud)