我有一个数据集存储为data.table DT,如下所示:
print(DT)
category industry
1: administration admin
2: nurse practitioner truck
3: trucking truck
4: administration admin
5: warehousing nurse
6: warehousing admin
7: trucking truck
8: nurse practitioner nurse
9: nurse practitioner truck
Run Code Online (Sandbox Code Playgroud)
我想将表格减少到只有行业与该类别匹配的行.我的一般方法是使用grepl()正则表达式匹配字符串'^{{INDUSTRY}}[a-z ]+$'和每一行DT$category,每个对应的行DT$industry插入在{{INDUSTRY}}正则表达式字符串中使用infuse().我很难找到一个时髦的data.table解决方案,它可以正确地循环遍历表并进行行内比较,所以我使用for循环来完成工作:
template <- "^{{IND}}[a-z ]+$"
DT[,match := FALSE,]
for (i in seq(1,length(DT$category))) {
ind <- DT[i]$industry
categ <- d.daily[i]$category
if (grepl(infuse(IND=ind,template),categ)){
DT[i]$match <- TRUE
}
}
DT<- DT[match==TRUE]
print(DT)
category industry …Run Code Online (Sandbox Code Playgroud)