Uwe 和 GKi 的答案都是正确的。Gki 获得赏金是因为 Uwe 迟到了,但 Uwe 的解决方案运行速度大约是其 15 倍
我有两个数据集,其中包含不同患者在多个测量时刻的分数,如下所示:
df1 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient3"),
"Days" = c(0,25,235,353,100,538),
"Score" = c(NA,2,3,4,5,6),
stringsAsFactors = FALSE)
df2 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient2","patient3"),
"Days" = c(0,25,248,353,100,150,503),
"Score" = c(1,10,3,4,5,7,6),
stringsAsFactors = FALSE)
> df1
ID Days Score
1 patient1 0 NA
2 patient1 25 2
3 patient1 235 3
4 patient1 353 4
5 patient2 100 5
6 patient3 538 6
> df2
ID Days Score
1 patient1 0 1
2 patient1 …Run Code Online (Sandbox Code Playgroud) 我正在尝试合并来自两个不同文件的数据。在每个文件中,一些数据与一些 ID 相关联。我想“组合”这两个文件,因为所有ID 都必须打印到一个新文件中,并且来自两个文件的数据必须与 ID 正确匹配。例子:
cat file_1
1.01 data_a
1.02 data_b
1.03 data_c
1.04 data_d
1.05 data_e
1.06 data_f
Run Code Online (Sandbox Code Playgroud)
cat file_2
1.01 data_aa
1.03 data_cc
1.05 data_ee
1.09 data_ii
Run Code Online (Sandbox Code Playgroud)
想要的结果是:
cat files_combined
1.01 data_a data_aa
1.02 data_b
1.03 data_c data_cc
1.04 data_d
1.05 data_e data_ee
1.06 data_f
1.09 data_ii
Run Code Online (Sandbox Code Playgroud)
我知道如何通过循环遍历每个 ID 以漫长而缓慢的方式来完成。一些伪代码示例:
awk -F\\t '{print $1}' file_1 > files_combined
awk -F\\t '{print $1}' file_2 >> files_combined
sort -u -n files_combined > tmp && mv tmp …Run Code Online (Sandbox Code Playgroud)