在 R 中合并 2 个具有相同但不同 case 列的数据框

Pra*_* KL 7 r

我有两个数据框,但问题是合并“by”列在不同情况下具有值。

sn1capx1e0001 与 SN1CAPX1E0001。

authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))

books <- data.frame(
name = I(c("tukey", "venables", "tierney",
           "tipley", "ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
          "Modern Applied Statistics ...",
          "LISP-STAT",
          "Spatial Statistics", "Stochastic Simulation",
          "Interactive Data Analysis",
          "An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
                 "Venables & Smith"))
m1 <- merge(authors, books, by.x = "surname", by.y = "name")
Run Code Online (Sandbox Code Playgroud)

给出

姓氏 国籍 死者头衔 其他作者

麦克尼尔澳大利亚 没有交互式数据分析 NA

所以我想通过不区分大小写来合并它们。我无法使用合并或加入。

我看到我们可以使用正则表达式来使用循环来匹配值。

Pra*_* KL 5

我发现这很简单

使用“toupper()”隐藏两者

books$name<-toupper(books$name) 
Run Code Online (Sandbox Code Playgroud)

简单的 ....


Lyn*_*akr 3

为什么不将它们转换为相同的形式呢?

library(stringr)

authors <- data.frame(
  surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
  nationality = c("US", "Australia", "US", "UK", "Australia"),
  deceased = c("yes", rep("no", 4)))

books <- data.frame(
  name = I(c("tukey", "venables", "tierney",
             "tipley", "ripley", "McNeil", "R Core")),
  title = c("Exploratory Data Analysis",
            "Modern Applied Statistics ...",
            "LISP-STAT",
            "Spatial Statistics", "Stochastic Simulation",
            "Interactive Data Analysis",
            "An Introduction to R"),
  other.author = c(NA, "Ripley", NA, NA, NA, NA,
                   "Venables & Smith"))

authors$surname <- str_to_title(authors$surname)
books$name <- str_to_title(books$name)

m1 <- merge(authors, books, by.x = "surname", by.y = "name")
Run Code Online (Sandbox Code Playgroud)

给出

   surname nationality deceased                         title other.author
1   Mcneil   Australia       no     Interactive Data Analysis         <NA>
2   Ripley          UK       no         Stochastic Simulation         <NA>
3  Tierney          US       no                     LISP-STAT         <NA>
4    Tukey          US      yes     Exploratory Data Analysis         <NA>
5 Venables   Australia       no Modern Applied Statistics ...       Ripley
Run Code Online (Sandbox Code Playgroud)