che*_*ree 5 regex if-statement r
如果模式匹配,我想用两个值之一填充一个新列.
这是我的数据框:
df <- structure(list(loc_01 = c("apis", "indu", "isro", "miss", "non_apis",
"non_indu", "non_isro", "non_miss", "non_piro", "non_sacn", "non_slbe",
"non_voya", "piro", "sacn", "slbe", "voya"), loc01_land = c(165730500,
62101800, 540687600, 161140500, 1694590200, 1459707300, 1025051400,
1419866100, 2037064500, 2204629200, 1918840500, 886299300, 264726000,
321003900, 241292700, 530532000)), class = "data.frame", row.names = c(NA,
-16L), .Names = c("loc_01", "loc01_land"))
Run Code Online (Sandbox Code Playgroud)
看起来像这样......
loc_01 loc01_land
1 apis 165730500
2 indu 62101800
3 isro 540687600
4 miss 161140500
5 non_apis 1694590200
6 non_indu 1459707300
7 non_isro 1025051400
8 non_miss 1419866100
9 non_piro 2037064500
10 non_sacn 2204629200
11 non_slbe 1918840500
12 non_voya 886299300
13 piro 264726000
14 sacn 321003900
15 slbe 241292700
16 voya 530532000
Run Code Online (Sandbox Code Playgroud)
我想添加一个df名为'loc_01' 的列.如果loc_01包含non,则返回'outside',如果它不包含non,则返回'inside'.这是我的ifelse语句,但我遗漏了一些东西,因为它只返回false值.
df$loc01 <- ifelse(df$loc_01=="non",'outside','inside')
Run Code Online (Sandbox Code Playgroud)
由此产生的df ......
loc_01 loc01_land loc01
1 apis 165730500 inside
2 indu 62101800 inside
3 isro 540687600 inside
4 miss 161140500 inside
5 non_apis 1694590200 inside
6 non_indu 1459707300 inside
7 non_isro 1025051400 inside
8 non_miss 1419866100 inside
9 non_piro 2037064500 inside
10 non_sacn 2204629200 inside
11 non_slbe 1918840500 inside
12 non_voya 886299300 inside
13 piro 264726000 inside
14 sacn 321003900 inside
15 slbe 241292700 inside
16 voya 530532000 inside
Run Code Online (Sandbox Code Playgroud)
谢谢-al
dig*_*All 22
要检查字符串是否包含某个子字符串,您不能使用==它,因为它执行完全匹配(即仅当字符串完全为"非"时才返回true).
您可以使用例如执行模式匹配的grepl函数(属于grep系列函数):
df$loc01 <- ifelse(grepl("non",df$loc_01),'outside','inside')
Run Code Online (Sandbox Code Playgroud)
结果:
> df
loc_01 loc01_land loc01
1 apis 165730500 inside
2 indu 62101800 inside
3 isro 540687600 inside
4 miss 161140500 inside
5 non_apis 1694590200 outside
6 non_indu 1459707300 outside
7 non_isro 1025051400 outside
8 non_miss 1419866100 outside
9 non_piro 2037064500 outside
10 non_sacn 2204629200 outside
11 non_slbe 1918840500 outside
12 non_voya 886299300 outside
13 piro 264726000 inside
14 sacn 321003900 inside
15 slbe 241292700 inside
16 voya 530532000 inside
Run Code Online (Sandbox Code Playgroud)
你只需要一行代码:
library(dplyr)
library(stringr)
df %>%
mutate(loc01 = if_else(str_starts(loc_01, "non_"), "outside", "inside"))
Run Code Online (Sandbox Code Playgroud)
要使用更复杂的正则表达式模式,您可以str_detect使用str_starts:
df %>%
mutate(loc01 = if_else(str_detect(loc_01, "^(non_)"), "outside", "inside"))
Run Code Online (Sandbox Code Playgroud)
输出:
loc_01 loc01_land loc01
<chr> <dbl> <chr>
1 apis 165730500 inside
2 indu 62101800 inside
3 isro 540687600 inside
4 miss 161140500 inside
5 non_apis 1694590200 outside
6 non_indu 1459707300 outside
7 non_isro 1025051400 outside
8 non_miss 1419866100 outside
9 non_piro 2037064500 outside
Run Code Online (Sandbox Code Playgroud)