嵌套的ifelse语句

bal*_*our 53 if-statement nested r sas

我还在学习如何将SAS代码翻译成R,然后收到警告.我需要了解我犯错误的地方.我想要做的是创建一个变量来总结和区分人口的3种状态:大陆,海外,外国人.我有一个包含2个变量的数据库:

  • 国籍:( idnat法国人,外国人),

如果idnat是法国人那么:

  • id出生地:( idbp大陆,殖民地,海外)

我想从汇总信息idnat,并idbp进入一个所谓的新变量idnat2:

  • 状态:k(大陆,海外,外国人)

所有这些变量都使用"字符类型".

列idnat2中预期的结果:

   idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign
Run Code Online (Sandbox Code Playgroud)

这是我要在R中翻译的SAS代码:

if idnat = "french" then do;
   if idbp in ("overseas","colony") then idnat2 = "overseas";
   else idnat2 = "mainland";
end;
else idnat2 = "foreigner";
run;
Run Code Online (Sandbox Code Playgroud)

这是我在R中的尝试:

if(idnat=="french"){
    idnat2 <- "mainland"
} else if(idbp=="overseas"|idbp=="colony"){
    idnat2 <- "overseas"
} else {
    idnat2 <- "foreigner"
}
Run Code Online (Sandbox Code Playgroud)

我收到这个警告:

Warning message:
In if (idnat=="french") { :
  the condition has length > 1 and only the first element will be used
Run Code Online (Sandbox Code Playgroud)

我被建议使用"嵌套ifelse"代替它的容易性,但获得更多警告:

idnat2 <- ifelse (idnat=="french", "mainland",
        ifelse (idbp=="overseas"|idbp=="colony", "overseas")
      )
            else (idnat2 <- "foreigner")
Run Code Online (Sandbox Code Playgroud)

根据警告消息,长度大于1,因此只考虑第一个括号之间的长度.对不起,但我不明白这个长度与这里有什么关系?谁知道我哪里错了?

Tom*_*eif 99

如果您使用的是任何电子表格应用程序,则有一个基本的if()语法函数:

if(<condition>, <yes>, <no>)
Run Code Online (Sandbox Code Playgroud)

ifelse()R中的语法完全相同:

ifelse(<condition>, <yes>, <no>)
Run Code Online (Sandbox Code Playgroud)

if()电子表格应用程序的唯一区别是R ifelse()是矢量化的(将向量作为输入并在输出上作为返回向量).考虑以下电子表格应用程序和R中公式的比较,我们希望比较a> b,如果是,则返回1,否则返回0.

在电子表格中:

  A  B C
1 3  1 =if(A1 > B1, 1, 0)
2 2  2 =if(A2 > B2, 1, 0)
3 1  3 =if(A3 > B3, 1, 0)
Run Code Online (Sandbox Code Playgroud)

在R:

> a <- 3:1; b <- 1:3
> ifelse(a > b, 1, 0)
[1] 1 0 0
Run Code Online (Sandbox Code Playgroud)

ifelse() 可以通过多种方式嵌套:

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>))

ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>)

ifelse(<condition>, 
       ifelse(<condition>, <yes>, <no>), 
       ifelse(<condition>, <yes>, <no>)
      )

ifelse(<condition>, <yes>, 
       ifelse(<condition>, <yes>, 
              ifelse(<condition>, <yes>, <no>)
             )
       )
Run Code Online (Sandbox Code Playgroud)

要计算列,idnat2您可以:

df <- read.table(header=TRUE, text="
idnat idbp idnat2
french mainland mainland
french colony overseas
french overseas overseas
foreign foreign foreign"
)

with(df, 
     ifelse(idnat=="french",
       ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign")
     )
Run Code Online (Sandbox Code Playgroud)

R文档

什么是the condition has length > 1 and only the first element will be used?让我们来看看:

> # What is first condition really testing?
> with(df, idnat=="french")
[1]  TRUE  TRUE  TRUE FALSE
> # This is result of vectorized function - equality of all elements in idnat and 
> # string "french" is tested.
> # Vector of logical values is returned (has the same length as idnat)
> df$idnat2 <- with(df,
+   if(idnat=="french"){
+   idnat2 <- "xxx"
+   }
+   )
Warning message:
In if (idnat == "french") { :
  the condition has length > 1 and only the first element will be used
> # Note that the first element of comparison is TRUE and that's whay we get:
> df
    idnat     idbp idnat2
1  french mainland    xxx
2  french   colony    xxx
3  french overseas    xxx
4 foreign  foreign    xxx
> # There is really logic in it, you have to get used to it
Run Code Online (Sandbox Code Playgroud)

我还能用if()吗?是的,你可以,但语法不是那么酷:)

test <- function(x) {
  if(x=="french") {
    "french"
  } else{
    "not really french"
  }
}

apply(array(df[["idnat"]]),MARGIN=1, FUN=test)
Run Code Online (Sandbox Code Playgroud)

如果您熟悉SQL,您还可以使用CASE 声明sqldf .

  • 这个解释真的很好,终于帮助我理解了嵌套`ifelse()`的方法。谢谢! (2认同)
  • 我在任何地方见过的嵌套 ifelse 的最佳解释。 (2认同)

Tho*_*mas 10

尝试以下内容:

# some sample data
idnat <- sample(c("french","foreigner"),100,TRUE)
idbp <- rep(NA,100)
idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE)

# recoding
out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland",
              ifelse(idbp %in% c("overseas","colony"),"overseas",
                     "foreigner"))
cbind(idnat,idbp,out) # check result
Run Code Online (Sandbox Code Playgroud)

您的困惑来自于SAS和R如何处理if-else结构.在R中,if并且else没有矢量化,这意味着它们检查单个条件是否为真(即if("french"=="french")工作)并且不能处理多个逻辑(即,if(c("french","foreigner")=="french")不起作用)并且R为您提供您正在接收的警告.

相比之下,它ifelse是矢量化的,所以它可以采用你的向量(也就是输入变量)并测试每个元素的逻辑条件,就像你在SAS中习惯的那样.围绕这个问题的另一种方法是使用ifelse语句构建一个循环(正如你在这里开始做的那样),但是矢量化ifelse方法将更有效并且通常涉及更少的代码.


Uwe*_*Uwe 8

如果数据集包含许多行,则使用查找表data.table而不是嵌套来加入查找表可能更有效ifelse().

提供下面的查找表

lookup
Run Code Online (Sandbox Code Playgroud)
     idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign
Run Code Online (Sandbox Code Playgroud)

和样本数据集

library(data.table)
n_row <- 10L
set.seed(1L)
DT <- data.table(idnat = "french",
                 idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE))
DT[idbp == "foreign", idnat := "foreign"][]
Run Code Online (Sandbox Code Playgroud)
      idnat     idbp
 1:  french   colony
 2:  french   colony
 3:  french overseas
 4: foreign  foreign
 5:  french mainland
 6: foreign  foreign
 7: foreign  foreign
 8:  french overseas
 9:  french overseas
10:  french mainland
Run Code Online (Sandbox Code Playgroud)

然后我们可以在加入时进行更新:

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]
Run Code Online (Sandbox Code Playgroud)
      idnat     idbp   idnat2
 1:  french   colony overseas
 2:  french   colony overseas
 3:  french overseas overseas
 4: foreign  foreign  foreign
 5:  french mainland mainland
 6: foreign  foreign  foreign
 7: foreign  foreign  foreign
 8:  french overseas overseas
 9:  french overseas overseas
10:  french mainland mainland
Run Code Online (Sandbox Code Playgroud)


Sve*_*ein 7

你可以idnat2不用if和创建矢量ifelse.

该功能replace可用于替代所有出现的"colony""overseas":

idnat2 <- replace(idbp, idbp == "colony", "overseas")
Run Code Online (Sandbox Code Playgroud)


mpa*_*nco 6

将 SQL CASE 语句与 dplyr 和 sqldf 包一起使用:

数据

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", 
"french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 
2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", 
"idbp"), class = "data.frame", row.names = c(NA, -4L))
Run Code Online (Sandbox Code Playgroud)

sqldf

library(sqldf)
sqldf("SELECT idnat, idbp,
        CASE 
          WHEN idbp IN ('colony', 'overseas') THEN 'overseas' 
          ELSE idbp 
        END AS idnat2
       FROM df")
Run Code Online (Sandbox Code Playgroud)

dplyr

library(dplyr)
df %>% 
mutate(idnat2 = case_when(idbp == 'mainland' ~ "mainland", 
                          idbp %in% c("colony", "overseas") ~ "overseas", 
                         TRUE ~ "foreign"))
Run Code Online (Sandbox Code Playgroud)

输出

    idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign
Run Code Online (Sandbox Code Playgroud)