检查条件并将项添加到数据框

ATM*_*hew 3 r dataframe

我正在尝试开发一个函数,它允许我将新元素输入到数据框,然后检查它们是否包含某些单词.

df <- data.frame(keyword=c("He drives a Honda", "He goes to Ohio State"), 
        car=c(1,0), school=c(0,1))
df

               keyword car school
     He drives a Honda   1      0
 He goes to Ohio State   0      1
Run Code Online (Sandbox Code Playgroud)

在此数据框中,汽车和学校是二进制值,如果来自汽车/学校矢量的单词是关键字的一部分,则该值包含1.如果关键字中不存在单词,则分配0.

car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
Run Code Online (Sandbox Code Playgroud)

我想使用一个函数在数据框中输入新的关键字,同时迭代汽车和学校矢量中特定值的关键字.

main <- function(keyword){
    n = strsplit(as.character(keyword), " ")[[1]]
    for( i in keyword ){
       if( any(n==car) ){
          df$car <- c(1)
       }
       if( any(n==school )){
          df$school <- c(1)
    }
}
Run Code Online (Sandbox Code Playgroud)

}

此功能未完成,会产生以下错误.因为汽车和学校的矢量长度为3,所以似乎产生了错误.

> main("He likes Ford and goes to Ohio State")            
Warning message:
In n == school :
  longer object length is not a multiple of shorter object length
Run Code Online (Sandbox Code Playgroud)

我也不确定如何将0/1值添加到df中.对于"他喜欢福特和去俄亥俄州立大学"的关键词,我应该在汽车和学校专栏中都有1个.

                              keyword       car          school
                    He drives a Honda        1             0
                He goes to Ohio State        0             1
He likes Honda and goes to Ohio State        1             1
Run Code Online (Sandbox Code Playgroud)

请帮忙.似乎该ifelse()函数对此任务非常有用,但我无法正确实现它.

had*_*ley 10

我认为最简单的方法是使用复合正则表达式:

library(stringr)

car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")

car_match <- str_c(car, collapse = "|")
school_match <- str_c(school, collapse = "|")


df <- data.frame(keyword=c("He drives a Honda", 
  "He goes to Ohio State", 
  "He likes Ford and goes to Ohio State"))

main <- function(df) {
  df$car <- str_detect(df$keyword, car_match)
  df$school <- str_detect(df$keyword, school_match)
  df
}
main(df)
Run Code Online (Sandbox Code Playgroud)


wkm*_*or1 5

几个小问题,但很容易修复几个%in%.你还需要一个特殊的逻辑表达式来解释strsplit由于空间而绊倒的"俄亥俄州" .

df <- data.frame(keyword=c("He drives a Honda", 
  "He goes to Ohio State", 
  "He likes Ford and goes to Ohio State"),
  car=0, school=0)

main <- function(df) {
  car <- c("Honda", "Chevy", "Toyota", "Ford")
  school <- c("Michigan", "Missouri")
  for (i in 1:nrow(df)) {
    Words = strsplit(as.character(df[i, 'keyword']), " ")[[1]]
    if(any(Words %in% car)) df[i, 'car'] <- 1
    if(any(Words == 'Ohio')) {
      if(Words[which(Words == 'Ohio') + 1] == 'State') df[i, 'school'] <- 1
    }   
    if(any(Words %in% school)) df[i, 'school'] <- 1 
  }
  return(df)
}

main(df)

                               keyword car school
1                    He drives a Honda   1      0
2                He goes to Ohio State   0      1
3 He likes Ford and goes to Ohio State   1      1
Run Code Online (Sandbox Code Playgroud)