use*_*714 17 regex string r match partial
我有2列的数据帧GL,并GLDESC与要添加一个名为第三列KIND基于一些数据是列内GLDESC.
数据框如下:
GL GLDESC
1 515100 Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300 Bulk Gas
4 539991 Area Charge In
5 551000 Repairs & Maint-Spare Parts
6 551100 Supplies-Operating
7 551300 Consumables
Run Code Online (Sandbox Code Playgroud)
对于数据表的每一行:
GLDESC中的Payroll任何地方包含单词,那么我想KIND成为PayrollGLDESC中的Gas任何地方包含单词,那么我想KIND成为MaterialsKIND成为Other我在stackoverflow上找了类似的例子,但找不到任何东西,也查看了R开关上的假人,grep,apply和正则表达式尝试匹配GLDESC列的一部分,然后填充KIND列的帐户类型但是无法让它起作用.
ale*_*laz 21
由于您只有两个条件,因此可以使用嵌套ifelse:
#random data; it wasn't easy to copy-paste yours
DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10),
c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12",
"asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))
DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials",
ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))
DF
# GL GLDESC KIND
#1 8 e gas l Materials
#2 1 c payroll12 y Payroll
#3 10 m GaSer v Materials
#4 6 t asdf n Other
#5 2 w qweaa t Other
#6 4 r PayROll-12 q Payroll
#7 9 n asdfg a Other
#8 5 d GAS--2 w Materials
#9 7 s fghfgh e Other
#10 3 g qweee k Other
Run Code Online (Sandbox Code Playgroud)
编辑 2016年10月3日 (..after得到更多的关注于预期)
处理更多模式的可能解决方案可能是迭代所有模式,并且只要匹配,就逐渐减少比较量:
ff = function(x, patterns, replacements = patterns, fill = NA, ...)
{
stopifnot(length(patterns) == length(replacements))
ans = rep_len(as.character(fill), length(x))
empty = seq_along(x)
for(i in seq_along(patterns)) {
greps = grepl(patterns[[i]], x[empty], ...)
ans[empty[greps]] = replacements[[i]]
empty = empty[!greps]
}
return(ans)
}
ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE)
# [1] "Materials" "Payroll" "Materials" "Other" "Other" "Payroll" "Other" "Materials" "Other" "Other"
ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),
c("pat1a|pat1b", "pat2", "pat3"),
c("1", "2", "3"), fill = "empty")
#[1] "1" "1" "3" "empty"
ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),
c("pat2", "pat1a|pat1b", "pat3"),
c("2", "1", "3"), fill = "empty")
#[1] "2" "1" "3" "empty"
Run Code Online (Sandbox Code Playgroud)