r中的ifelse模式匹配

che*_*ree 5 regex if-statement r

如果模式匹配,我想用两个值之一填充一个新列.

这是我的数据框:

df <- structure(list(loc_01 = c("apis", "indu", "isro", "miss", "non_apis", 
"non_indu", "non_isro", "non_miss", "non_piro", "non_sacn", "non_slbe", 
"non_voya", "piro", "sacn", "slbe", "voya"), loc01_land = c(165730500, 
62101800, 540687600, 161140500, 1694590200, 1459707300, 1025051400, 
1419866100, 2037064500, 2204629200, 1918840500, 886299300, 264726000, 
321003900, 241292700, 530532000)), class = "data.frame", row.names = c(NA, 
-16L), .Names = c("loc_01", "loc01_land"))
Run Code Online (Sandbox Code Playgroud)

看起来像这样......

     loc_01 loc01_land
1      apis  165730500
2      indu   62101800
3      isro  540687600
4      miss  161140500
5  non_apis 1694590200
6  non_indu 1459707300
7  non_isro 1025051400
8  non_miss 1419866100
9  non_piro 2037064500
10 non_sacn 2204629200
11 non_slbe 1918840500
12 non_voya  886299300
13     piro  264726000
14     sacn  321003900
15     slbe  241292700
16     voya  530532000
Run Code Online (Sandbox Code Playgroud)

我想添加一个df名为'loc_01' 的列.如果loc_01包含non,则返回'outside',如果它不包含non,则返回'inside'.这是我的ifelse语句,但我遗漏了一些东西,因为它只返回false值.

df$loc01 <- ifelse(df$loc_01=="non",'outside','inside')
Run Code Online (Sandbox Code Playgroud)

由此产生的df ......

     loc_01 loc01_land  loc01
1      apis  165730500 inside
2      indu   62101800 inside
3      isro  540687600 inside
4      miss  161140500 inside
5  non_apis 1694590200 inside
6  non_indu 1459707300 inside
7  non_isro 1025051400 inside
8  non_miss 1419866100 inside
9  non_piro 2037064500 inside
10 non_sacn 2204629200 inside
11 non_slbe 1918840500 inside
12 non_voya  886299300 inside
13     piro  264726000 inside
14     sacn  321003900 inside
15     slbe  241292700 inside
16     voya  530532000 inside
Run Code Online (Sandbox Code Playgroud)

谢谢-al

dig*_*All 22

要检查字符串是否包含某个子字符串,您不能使用==它,因为它执行完全匹配(即仅当字符串完全为"非"时才返回true).
您可以使用例如执行模式匹配的grepl函数(属于grep系列函数):

df$loc01 <- ifelse(grepl("non",df$loc_01),'outside','inside')
Run Code Online (Sandbox Code Playgroud)

结果:

> df
     loc_01 loc01_land   loc01
1      apis  165730500  inside
2      indu   62101800  inside
3      isro  540687600  inside
4      miss  161140500  inside
5  non_apis 1694590200 outside
6  non_indu 1459707300 outside
7  non_isro 1025051400 outside
8  non_miss 1419866100 outside
9  non_piro 2037064500 outside
10 non_sacn 2204629200 outside
11 non_slbe 1918840500 outside
12 non_voya  886299300 outside
13     piro  264726000  inside
14     sacn  321003900  inside
15     slbe  241292700  inside
16     voya  530532000  inside
Run Code Online (Sandbox Code Playgroud)


cod*_*b1e 7

你只需要一行代码:

library(dplyr)
library(stringr)


df %>% 
  mutate(loc01 = if_else(str_starts(loc_01, "non_"), "outside", "inside"))
Run Code Online (Sandbox Code Playgroud)

要使用更复杂的正则表达式模式,您可以str_detect使用str_starts

df %>% 
  mutate(loc01 = if_else(str_detect(loc_01, "^(non_)"), "outside", "inside"))
Run Code Online (Sandbox Code Playgroud)

输出:

   loc_01   loc01_land loc01  
   <chr>         <dbl> <chr>  
 1 apis      165730500 inside 
 2 indu       62101800 inside 
 3 isro      540687600 inside 
 4 miss      161140500 inside 
 5 non_apis 1694590200 outside
 6 non_indu 1459707300 outside
 7 non_isro 1025051400 outside
 8 non_miss 1419866100 outside
 9 non_piro 2037064500 outside
Run Code Online (Sandbox Code Playgroud)