在循环内粘贴指定的 3 列

Moh*_*hit 5 r data.table

df<-data.frame(expand.grid(c("a","b","c"),c("p","q","r"),c("x","y","z"),c("l","m","n")));
Run Code Online (Sandbox Code Playgroud)

我有这个有 4 列的表,我想将结果放在一个向量中,这样它应该是

paste0(df$Var1,df$Var2,df$Var4)
Run Code Online (Sandbox Code Playgroud)

这里提出的问题只是为了演示目的,所以我希望它非常动态和灵活。

我正在寻找任何可以按指定连接 2、3 或 4 列的代码。

如果我们说

i<-1;
j<-2;
k<-4;
paste0(df[,i],df[,j],df[,k])
Run Code Online (Sandbox Code Playgroud)

现在想象一下,如果 df 中有很多列,则执行相同的操作。还有列数以及哪些列应基于输入。

如果这是使用 data.table 包完成的,我将不胜感激。

r2e*_*ans 5

我想你需要do.call

df <- expand.grid(c("a","b","c"),c("p","q","r"),c("x","y","z"))
do.call(paste0, df)
#  [1] "apx" "bpx" "cpx" "aqx" "bqx" "cqx" "arx" "brx" "crx" "apy" "bpy" "cpy" "aqy" "bqy" "cqy" "ary" "bry" "cry" "apz" "bpz" "cpz" "aqz" "bqz" "cqz" "arz"
# [26] "brz" "crz"
Run Code Online (Sandbox Code Playgroud)

既然你提到了data.table,那么让我构建正确的对象:

library(data.table)
# option 1
df <- data.table(expand.grid(c("a","b","c"),c("p","q","r"),c("x","y","z")))
# option 2
df <- expand.grid(c("a","b","c"),c("p","q","r"),c("x","y","z"))
setDT(df)

# then
df[, do.call(paste0, .SD)]
#  [1] "apx" "bpx" "cpx" "aqx" "bqx" "cqx" "arx" "brx" "crx" "apy" "bpy" "cpy" "aqy" "bqy" "cqy" "ary" "bry" "cry" "apz" "bpz" "cpz" "aqz" "bqz" "cqz" "arz"
# [26] "brz" "crz"
Run Code Online (Sandbox Code Playgroud)

使用data.table's,.SD您可以指定某些列,例如df[, do.call(paste0, .SD), .SDcols = c(i, j, k)]


您需要选择的列只是一个列子集 ala df[,c(1,2,4)]

df <- expand.grid(c("a","b","c"),c("p","q","r"),c("x","y","z"),c("l","m","n"))
i <- 1
j <- 2
k <- 4
do.call(paste0, df[,c(i, j, k)])
#  [1] "apl" "bpl" "cpl" "aql" "bql" "cql" "arl" "brl" "crl" "apl" "bpl" "cpl" "aql" "bql" "cql" "arl" "brl" "crl" "apl" "bpl" "cpl" "aql" "bql" "cql" "arl"
# [26] "brl" "crl" "apm" "bpm" "cpm" "aqm" "bqm" "cqm" "arm" "brm" "crm" "apm" "bpm" "cpm" "aqm" "bqm" "cqm" "arm" "brm" "crm" "apm" "bpm" "cpm" "aqm" "bqm"
# [51] "cqm" "arm" "brm" "crm" "apn" "bpn" "cpn" "aqn" "bqn" "cqn" "arn" "brn" "crn" "apn" "bpn" "cpn" "aqn" "bqn" "cqn" "arn" "brn" "crn" "apn" "bpn" "cpn"
# [76] "aqn" "bqn" "cqn" "arn" "brn" "crn"
Run Code Online (Sandbox Code Playgroud)

正如 GregorThomas 在评论中所说,其data.table- 形式是:

as.data.table(df)[, do.call(paste0, .SD), .SDcols = c(i, j, k)]
Run Code Online (Sandbox Code Playgroud)


rps*_*227 5

另一种选择是使用Reduce()

df <-
  data.frame(expand.grid(c("a", "b", "c"), c("p", "q", "r"), c("x", "y", "z")))

df_pasted <- Reduce(paste0, df)
df_pasted
#>  [1] "apx" "bpx" "cpx" "aqx" "bqx" "cqx" "arx" "brx" "crx" "apy" "bpy" "cpy"
#> [13] "aqy" "bqy" "cqy" "ary" "bry" "cry" "apz" "bpz" "cpz" "aqz" "bqz" "cqz"
#> [25] "arz" "brz" "crz"
Run Code Online (Sandbox Code Playgroud)

创建于 2023-08-28,使用reprex v2.0.2

编辑:正如 r2evans 所指出的,您可以只指定要保留的列。如果您有data.table可以使用:

Reduce(paste0, df[, .(i, j, k), with = FALSE])
Run Code Online (Sandbox Code Playgroud)

  • 这很有效,我鼓励人们使用“Reduce”来了解如何有效地使用它。在本例中,它总共调用“paste0”“n-1”次(其中“n”是列数)。相反,使用 `do.call(paste0, ..)` 会调用它_once_。如果OP正在处理_大量_数据和/或长字符串,那么根据R的全局字符串池,这开始对R的性能产生一点有害影响:通过构建`“px”`,然后`在这个例子中,“apx”,我们实际上永远不需要中间的“px”,也永远不会看到它,但它无论如何都会使用内存。 (3认同)