Data.frame过滤

Question

Data.frame过滤

我有以下data.frame df:

df = data.frame(col1    = c('a','a','a','a','a','b','b','c','d'),
                col2    = c('a','a','a','b','b','b','b','a','a'),
                height1 = c(NA,32,NA,NA,NA,NA,NA,25,NA),
                height2 = c(31,31.5,NA,NA,11,12,13,NA,NA),
                col3    = 1:9)

#  col1 col2 height1 height2 col3
#1    a    a      NA    31.0    1
#2    a    a      32    31.5    2
#3    a    a      NA      NA    3
#4    a    b      NA      NA    4
#5    a    b      NA    11.0    5
#6    b    b      NA    12.0    6
#7    b    b      NA    13.0    7
#8    c    a      25      NA    8
#9    d    a      NA      NA    9

Run Code Online (Sandbox Code Playgroud)

我希望每个值都col1, col2可以构建一个height包含以下值的列:

如果只存在NA于height1和height2,返回NA.
如果有值height1,请取此值.(对于一对col1, col2,non NA列中最多只有一个值height1)
如果只有NAin height1和some non NA值height2,则取第一个值height2.

我还需要在列中保留相应的值col3.

新的data.frame new.df看起来像:

#  col1 col2 height col3
#1    a    a     32    2
#2    a    b     11    5
#3    b    b     12    6
#4    c    a     25    8
#5    d    a     NA    9

Run Code Online (Sandbox Code Playgroud)

我更喜欢一种data.frame方法,非常简洁,但我意识到我无法找到一种方法!

Answer 1

ber*_*ant 2

使用 dplyr：

df %>%
  mutate( 
    order = ifelse(!is.na(height1), 1, ifelse(!is.na(height2), 2, 3)),
    height = ifelse(!is.na(height1), height1, ifelse(!is.na(height2), height2, NA))
    ) %>%
  arrange( col1, col2, order, height) %>%
  distinct(col1, col2) %>%
  select( col1, col2, height, col3)

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	182 次
最近记录：	10 年，10 月前