将数据帧的互补行与 R 合并

Xav*_*ent 6 row r dataframe

我有这样一个数据框

0     weekday day month year hour basal bolus carb period.h
1    Tuesday  01    03 2016  0.0  0.25    NA   NA        0
2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10
4    Tuesday  01    03 2016 12.0  0.30    NA   NA       12
5    Tuesday  01    03 2016 17.0  0.50    NA   NA       17
6    Tuesday  01    03 2016 17.6    NA    NA   33       17
7    Tuesday  01    03 2016 17.6    NA  1.35   NA       17
8    Tuesday  01    03 2016 18.6    NA    NA   44       18
9    Tuesday  01    03 2016 18.6    NA  1.80   NA       18
10   Tuesday  01    03 2016 18.9    NA    NA   17       18
11   Tuesday  01    03 2016 18.9    NA  0.70   NA       18
12   Tuesday  01    03 2016 22.0  0.40    NA   NA       22
13 Wednesday  02    03 2016  0.0  0.25    NA   NA        0
14 Wednesday  02    03 2016  9.7    NA    NA   39        9
15 Wednesday  02    03 2016  9.7    NA  2.65   NA        9
16 Wednesday  02    03 2016 11.2    NA    NA   13       11
17 Wednesday  02    03 2016 11.2    NA  0.30   NA       11
18 Wednesday  02    03 2016 12.0  0.30    NA   NA       12
19 Wednesday  02    03 2016 12.0    NA    NA   16       12
20 Wednesday  02    03 2016 12.0    NA  0.65   NA       12
Run Code Online (Sandbox Code Playgroud)

如果您查看第 2 行和第 3 行,您会注意到它们完全对应于同一天和时间:仅对于第 2 行,“carb”不是 NA,而“bolus”不是 NA(这些是关于糖尿病)。

我想将这些行合并为一个:

2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10
Run Code Online (Sandbox Code Playgroud)

->

2    Tuesday  01    03 2016 10.9    NA    4.15   67       10
Run Code Online (Sandbox Code Playgroud)

我当然可以在每一行上做一个残酷的双循环,但我正在寻找一种更聪明、更快的方法。

Psi*_*dom 3

您可以在此处按公共标识符列对数据框进行分组weekday, day, month, year, hour, period.h,然后排序并从要合并的其余列中取出第一个元素,sort()默认情况下,函数将删除NA要排序的向量中的 s ,因此您最终会得到每组内每列的非 NA 元素;如果列中的所有元素均为NAsort(col)[1]则返回 NA:

library(dplyr)
df %>% 
       group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(sort(.)[1]))

#      weekday   day month  year  hour period.h basal bolus  carb
#       <fctr> <int> <int> <int> <dbl>    <int> <dbl> <dbl> <int>
# 1    Tuesday     1     3  2016   0.0        0  0.25    NA    NA
# 2    Tuesday     1     3  2016  10.9       10    NA  4.15    67
# 3    Tuesday     1     3  2016  12.0       12  0.30    NA    NA
# 4    Tuesday     1     3  2016  17.0       17  0.50    NA    NA
# 5    Tuesday     1     3  2016  17.6       17    NA  1.35    33
# 6    Tuesday     1     3  2016  18.6       18    NA  1.80    44
# 7    Tuesday     1     3  2016  18.9       18    NA  0.70    17
# 8    Tuesday     1     3  2016  22.0       22  0.40    NA    NA
# 9  Wednesday     2     3  2016   0.0        0  0.25    NA    NA
# 10 Wednesday     2     3  2016   9.7        9    NA  2.65    39
# 11 Wednesday     2     3  2016  11.2       11    NA  0.30    13
# 12 Wednesday     2     3  2016  12.0       12  0.30  0.65    16
Run Code Online (Sandbox Code Playgroud)

也许sort()这里使用的更合适的函数是na.omit()

df %>% group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(na.omit(.)[1]))
Run Code Online (Sandbox Code Playgroud)