data.table:如何将字符向量传递给函数 get data.table 以将其内容视为列名?

con*_*der 4 r function data.table

这是一个数据表:

library(data.table)
DT <- data.table(airquality)
Run Code Online (Sandbox Code Playgroud)

这个例子产生了我想要的输出:

DT[, `:=`(New_Ozone= log(Ozone), New_Wind=log(Wind))]
Run Code Online (Sandbox Code Playgroud)

如何编写一个函数log_those_columns,使以下代码片段输出相同的结果?

old_names <- c("Ozone", "Wind")
new_names <- c("New_Ozone", "New_Wind")
log_those_columns(DT, old_names, new_names)

Run Code Online (Sandbox Code Playgroud)

请注意,我需要old_names并且new_names足够灵活以包含任意数量的列。

(我从关于这一主题的类似StackOverflow的问题看,答案可能涉及的一些组合.SDwith=Fparse()eval(),和/或substitute(),但我似乎没有指甲要使用的那些和在哪里可以)。

Uwe*_*Uwe 5

Picking up MichaelChirico's comment, the function definition can be written as:

log_those_columns <- function(DT, cols_in, cols_new) {
  DT[, (cols_new) := lapply(.SD, log), .SDcols = cols_in]
}
Run Code Online (Sandbox Code Playgroud)

which returns:

log_those_columns(DT, old_names, new_names)
DT
Run Code Online (Sandbox Code Playgroud)
     Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
  1:    41     190  7.4   67     5   1  3.713572 2.001480
  2:    36     118  8.0   72     5   2  3.583519 2.079442
  3:    12     149 12.6   74     5   3  2.484907 2.533697
  4:    18     313 11.5   62     5   4  2.890372 2.442347
  5:    NA      NA 14.3   56     5   5        NA 2.660260
 ---                                                     
149:    30     193  6.9   70     9  26  3.401197 1.931521
150:    NA     145 13.2   77     9  27        NA 2.580217
151:    14     191 14.3   75     9  28  2.639057 2.660260
152:    18     131  8.0   76     9  29  2.890372 2.079442
153:    20     223 11.5   68     9  30  2.995732 2.442347
Run Code Online (Sandbox Code Playgroud)

as expected.

A more flexible approach

The function used to transform the data can be passed as a parameter as well:

fct_those_columns <- function(DT, cols_in, cols_new, fct) {
  DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}
Run Code Online (Sandbox Code Playgroud)

The call:

fct_those_columns(DT, old_names, new_names, log)
head(DT)
Run Code Online (Sandbox Code Playgroud)

works as expected:

   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  3.713572 2.001480
2:    36     118  8.0   72     5   2  3.583519 2.079442
3:    12     149 12.6   74     5   3  2.484907 2.533697
4:    18     313 11.5   62     5   4  2.890372 2.442347
5:    NA      NA 14.3   56     5   5        NA 2.660260
6:    28      NA 14.9   66     5   6  3.332205 2.701361
Run Code Online (Sandbox Code Playgroud)

The function name can be passed as character:

fct_those_columns(DT, old_names, new_names, "sqrt")
head(DT)
Run Code Online (Sandbox Code Playgroud)
   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  6.403124 2.720294
2:    36     118  8.0   72     5   2  6.000000 2.828427
3:    12     149 12.6   74     5   3  3.464102 3.549648
4:    18     313 11.5   62     5   4  4.242641 3.391165
5:    NA      NA 14.3   56     5   5        NA 3.781534
6:    28      NA 14.9   66     5   6  5.291503 3.860052
Run Code Online (Sandbox Code Playgroud)

or as an anonymous function:

fct_those_columns(DT, old_names, new_names, function(x) x^(1/2))
head(DT)
Run Code Online (Sandbox Code Playgroud)
   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  6.403124 2.720294
2:    36     118  8.0   72     5   2  6.000000 2.828427
3:    12     149 12.6   74     5   3  3.464102 3.549648
4:    18     313 11.5   62     5   4  4.242641 3.391165
5:    NA      NA 14.3   56     5   5        NA 3.781534
6:    28      NA 14.9   66     5   6  5.291503 3.860052
Run Code Online (Sandbox Code Playgroud)

An even more flexible approach

The function below derives the names of the new columns by prepending the names of the input columns with the name of the function automatically:

fct_those_columns <- function(DT, cols_in, fct) {
  fct_name <- substitute(fct)
  cols_new <- paste(if (class(fct_name) == "name") fct_name else fct_name[3], cols_in, sep = "_")
  DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}

DT <- data.table(airquality)
fct_those_columns(DT, old_names, sqrt)
fct_those_columns(DT, old_names, data.table::as.IDate)
fct_those_columns(DT, old_names, function(x) x^(1/2))
DT
Run Code Online (Sandbox Code Playgroud)
     Ozone Solar.R Wind Temp Month Day sqrt_Ozone sqrt_Wind as.IDate_Ozone as.IDate_Wind x^(1/2)_Ozone x^(1/2)_Wind
  1:    41     190  7.4   67     5   1   6.403124  2.720294     1970-02-11    1970-01-08      6.403124     2.720294
  2:    36     118  8.0   72     5   2   6.000000  2.828427     1970-02-06    1970-01-09      6.000000     2.828427
  3:    12     149 12.6   74     5   3   3.464102  3.549648     1970-01-13    1970-01-13      3.464102     3.549648
  4:    18     313 11.5   62     5   4   4.242641  3.391165     1970-01-19    1970-01-12      4.242641     3.391165
  5:    NA      NA 14.3   56     5   5         NA  3.781534           <NA>    1970-01-15            NA     3.781534
 ---                                                                                                               
149:    30     193  6.9   70     9  26   5.477226  2.626785     1970-01-31    1970-01-07      5.477226     2.626785
150:    NA     145 13.2   77     9  27         NA  3.633180           <NA>    1970-01-14            NA     3.633180
151:    14     191 14.3   75     9  28   3.741657  3.781534     1970-01-15    1970-01-15      3.741657     3.781534
152:    18     131  8.0   76     9  29   4.242641  2.828427     1970-01-19    1970-01-09      4.242641     2.828427
153:    20     223 11.5   68     9  30   4.472136  3.391165     1970-01-21    1970-01-12      4.472136     3.391165
Run Code Online (Sandbox Code Playgroud)

请注意,这x^(1/2)_Ozone在 R 中不是语法上有效的名称,需要放在反引号中:

DT$`x^(1/2)_Ozone`
Run Code Online (Sandbox Code Playgroud)