我正在尝试开发一个函数,它允许我将新元素输入到数据框,然后检查它们是否包含某些单词.
df <- data.frame(keyword=c("He drives a Honda", "He goes to Ohio State"),
car=c(1,0), school=c(0,1))
df
keyword car school
He drives a Honda 1 0
He goes to Ohio State 0 1
Run Code Online (Sandbox Code Playgroud)
在此数据框中,汽车和学校是二进制值,如果来自汽车/学校矢量的单词是关键字的一部分,则该值包含1.如果关键字中不存在单词,则分配0.
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
Run Code Online (Sandbox Code Playgroud)
我想使用一个函数在数据框中输入新的关键字,同时迭代汽车和学校矢量中特定值的关键字.
main <- function(keyword){
n = strsplit(as.character(keyword), " ")[[1]]
for( i in keyword ){
if( any(n==car) ){
df$car <- c(1)
}
if( any(n==school )){
df$school <- c(1)
}
}
Run Code Online (Sandbox Code Playgroud)
}
此功能未完成,会产生以下错误.因为汽车和学校的矢量长度为3,所以似乎产生了错误.
> main("He likes Ford and goes to Ohio State")
Warning message:
In n == school :
longer object length is not a multiple of shorter object length
Run Code Online (Sandbox Code Playgroud)
我也不确定如何将0/1值添加到df中.对于"他喜欢福特和去俄亥俄州立大学"的关键词,我应该在汽车和学校专栏中都有1个.
keyword car school
He drives a Honda 1 0
He goes to Ohio State 0 1
He likes Honda and goes to Ohio State 1 1
Run Code Online (Sandbox Code Playgroud)
请帮忙.似乎该ifelse()函数对此任务非常有用,但我无法正确实现它.
had*_*ley 10
我认为最简单的方法是使用复合正则表达式:
library(stringr)
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
car_match <- str_c(car, collapse = "|")
school_match <- str_c(school, collapse = "|")
df <- data.frame(keyword=c("He drives a Honda",
"He goes to Ohio State",
"He likes Ford and goes to Ohio State"))
main <- function(df) {
df$car <- str_detect(df$keyword, car_match)
df$school <- str_detect(df$keyword, school_match)
df
}
main(df)
Run Code Online (Sandbox Code Playgroud)
几个小问题,但很容易修复几个%in%.你还需要一个特殊的逻辑表达式来解释strsplit由于空间而绊倒的"俄亥俄州" .
df <- data.frame(keyword=c("He drives a Honda",
"He goes to Ohio State",
"He likes Ford and goes to Ohio State"),
car=0, school=0)
main <- function(df) {
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Missouri")
for (i in 1:nrow(df)) {
Words = strsplit(as.character(df[i, 'keyword']), " ")[[1]]
if(any(Words %in% car)) df[i, 'car'] <- 1
if(any(Words == 'Ohio')) {
if(Words[which(Words == 'Ohio') + 1] == 'State') df[i, 'school'] <- 1
}
if(any(Words %in% school)) df[i, 'school'] <- 1
}
return(df)
}
main(df)
keyword car school
1 He drives a Honda 1 0
2 He goes to Ohio State 0 1
3 He likes Ford and goes to Ohio State 1 1
Run Code Online (Sandbox Code Playgroud)