我有一个复杂的问题,关于data.table嵌套在另一个data.table. 我能够在下面的可重现示例中重现该行为。
对不起,它仍然很长,需要一些时间才能完全理解,但它是我能够产生的更短的时间来指出我的问题。
假设我创建了以下data.table名称,data_1其中包含单个类型的列data.table:
library(data.table)
set.seed(20200602L)
data_1 <- data.table(
foo = replicate(5L, {
data.table(
bar = lapply(sample(3L, 5L, replace=TRUE), rpois, 1)
)
}, simplify=FALSE)
)
data_1[]
## foo
## 1: <data.table>
## 2: <data.table>
## 3: <data.table>
## 4: <data.table>
## 5: <data.table>
Run Code Online (Sandbox Code Playgroud)
可以探索foo以下专栏的内容:
data_1[, foo]
## [[1]]
## bar
## 1: 4,0,1
## 2: 0,2
## 3: 1,3,2
## 4: 1,1
## 5: 0
##
## [[2]]
## bar
## 1: 2
## 2: 0,3
## 3: 0
## 4: 2,3
## 5: 0,0
##
## [[3]]
## bar
## 1: 0,1,1
## 2: 1,2,1
## 3: 2,1
## 4: 1
## 5: 1
##
## [[4]]
## bar
## 1: 1
## 2: 3,3
## 3: 0
## 4: 2,2
## 5: 0,0,0
##
## [[5]]
## bar
## 1: 0,0
## 2: 0,0
## 3: 0,1
## 4: 2,1
## 5: 0
Run Code Online (Sandbox Code Playgroud)
然后我想创建一个函数fun(),该函数将向列baz中的每个元素添加一列foo。此列baz将反映bar如下所示的列表:
fun <- function(data) {
data[, .(lapply(foo, function(x) {
x[, baz:=lapply(bar, function(y) {
rev(y)
})]
}))]
}
Run Code Online (Sandbox Code Playgroud)
在将该函数应用于 之前data_1,我会将其复制到 中,data_2因为我需要保持原件完好无损。
data_2 <- copy(data_1)
invisible(fun(data_1))
data_1[, foo]
## [[1]]
## bar baz
## 1: 4,0,1 1,0,4
## 2: 0,2 2,0
## 3: 1,3,2 2,3,1
## 4: 1,1 1,1
## 5: 0 0
##
## [[2]]
## bar baz
## 1: 2 2
## 2: 0,3 3,0
## 3: 0 0
## 4: 2,3 3,2
## 5: 0,0 0,0
##
## [[3]]
## bar baz
## 1: 0,1,1 1,1,0
## 2: 1,2,1 1,2,1
## 3: 2,1 1,2
## 4: 1 1
## 5: 1 1
##
## [[4]]
## bar baz
## 1: 1 1
## 2: 3,3 3,3
## 3: 0 0
## 4: 2,2 2,2
## 5: 0,0,0 0,0,0
##
## [[5]]
## bar baz
## 1: 0,0 0,0
## 2: 0,0 0,0
## 3: 0,1 1,0
## 4: 2,1 1,2
## 5: 0 0
Run Code Online (Sandbox Code Playgroud)
人们可以仔细检查data_2仍然完好无损:
data_2[, foo]
## [[1]]
## bar
## 1: 4,0,1
## 2: 0,2
## 3: 1,3,2
## 4: 1,1
## 5: 0
##
## [[2]]
## bar
## 1: 2
## 2: 0,3
## 3: 0
## 4: 2,3
## 5: 0,0
##
## [[3]]
## bar
## 1: 0,1,1
## 2: 1,2,1
## 3: 2,1
## 4: 1
## 5: 1
##
## [[4]]
## bar
## 1: 1
## 2: 3,3
## 3: 0
## 4: 2,2
## 5: 0,0,0
##
## [[5]]
## bar
## 1: 0,0
## 2: 0,0
## 3: 0,1
## 4: 2,1
## 5: 0
Run Code Online (Sandbox Code Playgroud)
到那时,一切看起来都很好。但是,让我们说我改变了主意,我想给函数适用fun()于data_2为好。我原以为它的工作方式与data_1. 不幸的是,它不是:
invisible(fun(data_2))
## Warning messages:
## 1: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 2: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 3: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 4: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
## 5: In `[.data.table`(x, , `:=`(baz, lapply(bar, function(y) { :
## Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
data_2[, foo]
## [[1]]
## bar
## 1: 4,0,1
## 2: 0,2
## 3: 1,3,2
## 4: 1,1
## 5: 0
##
## [[2]]
## bar
## 1: 2
## 2: 0,3
## 3: 0
## 4: 2,3
## 5: 0,0
##
## [[3]]
## bar
## 1: 0,1,1
## 2: 1,2,1
## 3: 2,1
## 4: 1
## 5: 1
##
## [[4]]
## bar
## 1: 1
## 2: 3,3
## 3: 0
## 4: 2,2
## 5: 0,0,0
##
## [[5]]
## bar
## 1: 0,0
## 2: 0,0
## 3: 0,1
## 4: 2,1
## 5: 0
Run Code Online (Sandbox Code Playgroud)
有人可以解释我为什么,也许可以指出我解决问题的方法吗?
参考
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: SUSE Linux Enterprise Server 12 SP5
##
## Matrix products: default
## BLAS: /apps/R-4.0.0/lib/libRblas.so
## LAPACK: /apps/R-4.0.0/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.12.8
##
## loaded via a namespace (and not attached):
## [1] compiler_4.0.0 tools_4.0.0
Run Code Online (Sandbox Code Playgroud)
的.internal.selfref没有被更新copy为构成data.tableS:
all.equal(
lapply(data_1$foo, attr, '.internal.selfref'),
lapply(data_2$foo, attr, '.internal.selfref')
)
# [1] TRUE
Run Code Online (Sandbox Code Playgroud)
这需要更新;您可以通过alloc.col在复制的data.tables上运行来解决问题:
data_2 = copy(data_1)
# also possible to do lapply(foo, copy), but this should be slower
data_2[ , foo := lapply(foo, alloc.col)]
invisible(fun(data_1))
invisible(fun(data_2))
Run Code Online (Sandbox Code Playgroud)