如何根据多列中的多个条件创建新列?

Ank*_*kie 7 if-statement r calculated-columns multiple-conditions dataframe

我正在尝试根据其他列的几个条件向数据框添加新列.我有以下数据:

> commute <- c("walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry")
> kids <- c("Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes")
> distance <- c(1, 12, 5, 25, 7, 2, "", 8, 19, 7, "", 4, 16, 12, 7)
> 
> df = data.frame(commute, kids, distance)
> df
   commute kids distance
1     walk  Yes        1
2     bike  Yes       12
3   subway   No        5
4    drive   No       25
5    ferry  Yes        7
6     walk  Yes        2
7     bike   No         
8   subway   No        8
9    drive  Yes       19
10   ferry  Yes        7
11    walk   No         
12    bike   No        4
13  subway  Yes       16
14   drive   No       12
15   ferry  Yes        7
Run Code Online (Sandbox Code Playgroud)

如果满足以下三个条件:

commute = walk OR bike OR subway OR ferry
AND
kids = Yes
AND
distance is less than 10
Run Code Online (Sandbox Code Playgroud)

然后我想要一个名为get.flyer的新列等于"是".最终数据框应如下所示:

   commute kids distance get.flyer
1     walk  Yes        1       Yes
2     bike  Yes       12       Yes
3   subway   No        5          
4    drive   No       25          
5    ferry  Yes        7       Yes
6     walk  Yes        2       Yes
7     bike   No                   
8   subway   No        8          
9    drive  Yes       19          
10   ferry  Yes        7       Yes
11    walk   No                   
12    bike   No        4          
13  subway  Yes       16       Yes
14   drive   No       12          
15   ferry  Yes        7       Yes
Run Code Online (Sandbox Code Playgroud)

akr*_*run 10

我们可以%in%用来比较列中的多个元素,&以检查两个条件是否都为TRUE.

library(dplyr)
df %>%
     mutate(get.flyer = c("", "Yes")[(commute %in% c("walk", "bike", "subway", "ferry") & 
           as.character(kids) == "Yes" & 
           as.numeric(as.character(distance)) < 10)+1] )
Run Code Online (Sandbox Code Playgroud)

默认情况下最好创建data.framewith .如果我们检查,我们可以发现所有列都是类.另外,如果有遗漏值,而不是,可以用来避免转换一个的列到别的东西.stringsAsFactors=FALSETRUEstr(df)factor""NAclassnumeric

如果我们重写'df'的创建

distance <- c(1, 12, 5, 25, 7, 2, NA, 8, 19, 7, NA, 4, 16, 12, 7)
df1 <- data.frame(commute, kids, distance, stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)

上面的代码可以简化

df1 %>%
    mutate(get.flyer = c("", "Yes")[(commute %in% c("walk", "bike", "subway", "ferry") &
        kids == "Yes" &
        distance < 10)+1] )
Run Code Online (Sandbox Code Playgroud)

为了更好地理解,有些人更喜欢 ifelse

df1 %>% 
   mutate(get.flyer = ifelse(commute %in% c("walk", "bike", "subway", "ferry") & 
                kids == "Yes" &
                distance < 10, 
                          "Yes", ""))
Run Code Online (Sandbox Code Playgroud)

这也可以通过base R方法轻松完成

df1$get.flyer <- with(df1, ifelse(commute %in% c("walk", "bike", "subway", "ferry") & 
              kids == "Yes" & 
              distance < 10, 
                       "Yes", ""))
Run Code Online (Sandbox Code Playgroud)


Tw *_*Nus 7

@akrun已经指出了解决方案.我想以更"包裹"的方式呈现它.

您可以使用该ifelse语句基于一个(或多个)条件创建列.但首先,您必须更改距离列中缺失值的"编码".您曾经""指示缺失值,但是这会将整个列转换为string并禁止数字比较(distance < 10不可能).该R指示缺失值的方法是NA,你的列定义distance应该是:

distance <- c(1, 12, 5, 25, 7, 2, NA, 8, 19, 7, NA, 4, 16, 12, 7)
Run Code Online (Sandbox Code Playgroud)

然后ifelse声明如下:

df$get.flyer <- ifelse(
    ( 
        (df$commute %in% c("walk", "bike", "subway", "ferry")) &
        (df$kids == "Yes")                                     &
        (df$distance < 10)
    ),
    1,  # if condition is met, put 1
    0   # else put 0
)
Run Code Online (Sandbox Code Playgroud)

可选:考虑以不同的方式对其他列进行编码:

  • 您可以使用TRUEFALSE不是"是"和"否"作为kids变量
  • 你可以使用一个factor通勤