我有以下data.frame df:
df = data.frame(col1 = c('a','a','a','a','a','b','b','c','d'),
col2 = c('a','a','a','b','b','b','b','a','a'),
height1 = c(NA,32,NA,NA,NA,NA,NA,25,NA),
height2 = c(31,31.5,NA,NA,11,12,13,NA,NA),
col3 = 1:9)
# col1 col2 height1 height2 col3
#1 a a NA 31.0 1
#2 a a 32 31.5 2
#3 a a NA NA 3
#4 a b NA NA 4
#5 a b NA 11.0 5
#6 b b NA 12.0 6
#7 b b NA 13.0 7
#8 c a 25 NA 8
#9 d a NA NA 9
Run Code Online (Sandbox Code Playgroud)
我希望每个值都col1, col2可以构建一个height包含以下值的列:
NA于height1和height2,返回NA.height1,请取此值.(对于一对col1, col2,non NA列中最多只有一个值height1)NAin height1和some non NA值height2,则取第一个值height2.我还需要在列中保留相应的值col3.
新的data.frame new.df看起来像:
# col1 col2 height col3
#1 a a 32 2
#2 a b 11 5
#3 b b 12 6
#4 c a 25 8
#5 d a NA 9
Run Code Online (Sandbox Code Playgroud)
我更喜欢一种data.frame方法,非常简洁,但我意识到我无法找到一种方法!
使用 dplyr:
df %>%
mutate(
order = ifelse(!is.na(height1), 1, ifelse(!is.na(height2), 2, 3)),
height = ifelse(!is.na(height1), height1, ifelse(!is.na(height2), height2, NA))
) %>%
arrange( col1, col2, order, height) %>%
distinct(col1, col2) %>%
select( col1, col2, height, col3)
Run Code Online (Sandbox Code Playgroud)