R data.table 奇怪的值/引用语义

Ofe*_*lon 15 r data.table

(这是一个跟进问题这个。)

检查这个玩具代码:

> x <- data.frame(a = 1:2)
> foo <- function(z) { setDT(z) ; z[, b:=3:4] ; z } 
> y <- foo(x)
> 
> class(x)
[1] "data.table" "data.frame"
> x
   a
1: 1
2: 2
Run Code Online (Sandbox Code Playgroud)

看起来 setDT 确实改变了 x 的类,但是添加的数据不适用于 x。
这里发生了什么?

GKi*_*GKi 4

在您的函数中是对截至z的引用。xsetDT

\n
library(data.table)\nfoo <- function(z) {print(address(z)); setDT(z); print(address(z))} \nx <- data.frame(a = 1:2)\naddress(x)\n#[1] "0x555ec9a471e8"\nfoo(x)\n#[1] "0x555ec9a471e8"\n#[1] "0x555ec9ede300"\n
Run Code Online (Sandbox Code Playgroud)\n

setDT下面的行中,z仍然指向相同的地址,例如x

\n
setattr(z, "class", data.table:::.resetclass(z, "data.frame"))\n
Run Code Online (Sandbox Code Playgroud)\n

setattr不复印。因此xz仍然指向相同的地址,并且现在都属于同一地址data.frame

\n
x <- data.frame(a = 1:2)\nz <- x\nclass(x)\n#[1] "data.frame"\naddress(x)\n#[1] "0x555ec95de600"\naddress(z)\n#[1] "0x555ec95de600"\n\nsetattr(z, "class", data.table:::.resetclass(z, "data.frame"))\n\nclass(x)\n#[1] "data.table" "data.frame"\naddress(x)\n#[1] "0x555ec95de600"\naddress(z)\n#[1] "0x555ec95de600"\n
Run Code Online (Sandbox Code Playgroud)\n

然后setalloccol被调用,在这种情况下调用:

\n
assign("z", .Call(data.table:::Calloccolwrapper, z, 1024, FALSE))\n
Run Code Online (Sandbox Code Playgroud)\n

现在让xz指向不同的地址。

\n
address(x)\n#[1] "0x555ecaa09c00"\naddress(z)\n#[1] "0x555ec95de600"\n
Run Code Online (Sandbox Code Playgroud)\n

并且两者都具有class data.frame

\n
class(x)\n#[1] "data.table" "data.frame"\nclass(z)\n#[1] "data.table" "data.frame"\n
Run Code Online (Sandbox Code Playgroud)\n

我想他们什么时候会使用

\n
class(z) <- data.table:::.resetclass(z, "data.frame")\n
Run Code Online (Sandbox Code Playgroud)\n

代替

\n
setattr(z, "class", data.table:::.resetclass(z, "data.frame"))\n
Run Code Online (Sandbox Code Playgroud)\n

就不会出现这个问题。

\n
x <- data.frame(a = 1:2)\nz <- x\naddress(x)\n#[1] "0x555ec9cd2228"\nclass(z) <- data.table:::.resetclass(z, "data.frame")\nclass(x)\n#[1] "data.frame"\nclass(z)\n#[1] "data.table" "data.frame"\naddress(x)\n#[1] "0x555ec9cd2228"\naddress(z)\n#[1] "0x555ec9cd65a8"\n
Run Code Online (Sandbox Code Playgroud)\n

但 afterclass(z) <- value z不会指向之前指向的相同地址:

\n
z <- data.frame(a = 1:2)\naddress(z)\n#[1] "0x5653dbe72b68"\naddress(z$a)\n#[1] "0x5653db82e140"\nclass(z) <- c("data.table", "data.frame")\naddress(z)\n#[1] "0x5653dbe82d98"\naddress(z$a)\n#[1] "0x5653db82e140"\n
Run Code Online (Sandbox Code Playgroud)\n

但之后setDT它也不会指向之前指向的相同地址:

\n
z <- data.frame(a = 1:2)\naddress(z)\n#[1] "0x55b6f04d0db8"\nsetDT(z)\naddress(z)\n#[1] "0x55b6efe1e0e0"\n
Run Code Online (Sandbox Code Playgroud)\n

正如@Matt-dowle 指出的,也可以更改xover中的数据z

\n
x <- data.frame(a = c(1,3))\nz <- x\nsetDT(z)\nz[, b:=3:4]\nz[2, a:=7]\nz\n#   a b\n#1: 1 3\n#2: 7 4\nx\n#   a\n#1: 1\n#2: 7\n
Run Code Online (Sandbox Code Playgroud)\n
R.version.string\n#[1] "R version 4.0.2 (2020-06-22)"\npackageVersion("data.table")\n#[1] \xe2\x80\x981.12.8\xe2\x80\x99\n
Run Code Online (Sandbox Code Playgroud)\n