aja*_*jax 6 python string r string-matching
我试图将数据框的列中的字符串与另一个数据框的列中的字符串进行匹配,并映射相应的值.两个数据帧的行数不同
df1 = data.frame(name = c("(CKMB)Creatinine Kinase Muscle & Brain", "24 Hours Urine for Sodium", "Antistreptolysin O Titer", "Blood group O", lonic_code = c("27816-8-O", "27816-8-B", "1869-7", "33914-3")
df2 = data.frame(Testcomponents = c("creatinine", "blood", "potassium"))
Run Code Online (Sandbox Code Playgroud)
预期产出
Test Components lonic_code
creatinine 27816-8-O
blood 1869-7
potassium NA
Run Code Online (Sandbox Code Playgroud)
regex_right_join在这种情况下可能会很方便。
library(fuzzyjoin)
library(dplyr)
df1 %>%
mutate(name = as.character(name)) %>%
regex_right_join(df2 %>%
mutate(Testcomponents = as.character(Testcomponents)),
by = c(name = "Testcomponents"), ignore_case = T) %>%
select(Testcomponents, lonic_code)
Run Code Online (Sandbox Code Playgroud)
输出是:
Testcomponents lonic_code
1 creatinine 27816-8-O
2 blood 33914-3
3 potassium <NA>
Run Code Online (Sandbox Code Playgroud)
样本数据:
df1 <- structure(list(name = structure(1:4, .Label = c("(CKMB)Creatinine Kinase Muscle & Brain",
"24 Hours Urine for Sodium", "Antistreptolysin O Titer", "Blood group O"
), class = "factor"), lonic_code = structure(c(3L, 2L, 1L, 4L
), .Label = c("1869-7", "27816-8-B", "27816-8-O", "33914-3"), class = "factor")), .Names = c("name",
"lonic_code"), row.names = c(NA, -4L), class = "data.frame")
df2 <- structure(list(Testcomponents = structure(c(2L, 1L, 3L), .Label = c("blood",
"creatinine", "potassium"), class = "factor")), .Names = "Testcomponents", row.names = c(NA,
-3L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)