动态确定数据框列是否存在，如果存在则进行变异

Question

动态确定数据框列是否存在，如果存在则进行变异

我有根据客户端名称从数据库中提取和处理数据的代码。某些客户端可能拥有不包含特定列名称的数据，例如，last_name或first_name。对于不使用last_nameor 的客户first_name，我不在乎。对于确实使用这些字段中的任何一个的客户，我需要使用mutate()这些列，toupper()以便我可以稍后在 ETL 过程中加入这些标准化字段。

现在，我正在使用一系列if()语句和一些辅助函数来查看数据帧的名称，然后在它们存在时进行变异。我使用if()语句是因为ifelse()主要是矢量化的并且不能很好地处理数据帧。

library(dplyr)
set.seed(256)

b <- data.frame(id = sample(1:100, 5, FALSE), 
                col_name = sample(1000:9999, 5, FALSE), 
                another_col = sample(1000:9999, 5, FALSE))

d <- data.frame(id = sample(1:100, 5, FALSE), 
                col_name = sample(1000:9999, 5, FALSE), 
                last_name = sample(letters, 5, FALSE))

mutate_first_last <- function(df){

  mutate_first_name <- function(df){
    df %>%
      mutate(first_name = first_name %>% toupper())
  }

  mutate_last_name <- function(df){
    df %>%
      mutate(last_name = last_name %>% toupper())
  }


  n <- c("first_name", "last_name") %in% names(df)

  if (n[1] & n[2]) return(df %>% mutate_first_name() %>% mutate_last_name())
  if (n[1] & !n[2]) return(df %>% mutate_first_name())
  if (!n[1] & n[2]) return(df %>% mutate_last_name())
  if (!n[1] & !n[2]) return(df)

}

Run Code Online (Sandbox Code Playgroud)

我得到了我期望以这种方式得到的

> b %>% mutate_first_last()
  id col_name another_col
1 48     8318        6207
2 39     7155        7170
3 16     4486        4321
4 55     2521        8024
5 15     1412        4875
> d %>% mutate_first_last()
  id col_name last_name
1 64     7438         A
2 43     4551         Q
3 48     7401         K
4 78     3682         Z
5 87     2554         J

Run Code Online (Sandbox Code Playgroud)

但这是处理此类任务的最佳方式吗？要动态查看数据框中是否存在列名，然后在存在时对其进行变异？if()在这个函数中必须有多个语句似乎很奇怪。有没有更简化的方法来处理这些数据？

Answer 1

Shr*_*ree 7

您可以使用mutate_atwith one_of，两者都来自dplyr。仅当它与c("first_name", "last_name"). 如果不匹配，它将生成一个简单的警告，但您可以忽略或取消它。

library(dplyr)

d %>%
  mutate_at(vars(one_of(c("first_name", "last_name")), toupper)

  id col_name last_name
1 19     7461         V
2 52     9651         H
3 56     1901         P
4 13     7866         Z
5 25     9527         U

# example with no match
b %>%
  mutate_at(vars(one_of(c("first_name", "last_name"))), toupper)

  id col_name another_col
1 34     9315        8686
2 26     5598        4124
3 17     3318        2182
4 32     1418        4369
5 49     4759        6680
Warning message:
Unknown variables: `first_name`, `last_name`

Run Code Online (Sandbox Code Playgroud)

这里有一堆其他?select_helpers的dplyr-

这些函数允许您根据名称选择变量。

starts_with()：以前缀开头

Ends_with()：以前缀结尾

contains()：包含一个文字字符串

match()：匹配正则表达式

num_range()：数值范围，如 x01、x02、x03。

one_of()：字符向量中的变量。

Everything()：所有变量。

归档时间：	6 年，11 月前
查看次数：	1631 次
最近记录：	4 年，3 月前