在data.table列中使用列表

Mic*_*ele 23 r data.table

In data.table可能有类型的列,list我第一次尝试从此功能中受益.我需要为我的表的每一行存储dt从rApache Web服务获取的几条注释.每条评论都有一个用户名,日期时间和正文项.

而不是使用带有一些奇怪的,不寻常的字符的长字符串来将每个消息与其他消息分开(例如|),并且;要分隔评论中的每个项目,我想使用这样的列表:

library(data.table)
dt <- data.table(id=1:2,
        comment=list(list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world")),
          list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world"))))

> dt
   id comment
1:  1  <list>
2:  2  <list>
Run Code Online (Sandbox Code Playgroud)

存储为一个特定行添加的所有注释.(也因为JSON当我需要将其发送回用户界面时,将更容易转换为以后)

但是,当我尝试模拟在生产过程中我将如何填充表格时(向特定行添加单个注释),R要么崩溃,要么不分配我想要的然后崩溃:

library(data.table)

> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL

[[2]]
NULL

> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1

[[2]]
NULL

> set(dt, 1L, "comment", list(1, "a"))  # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
  Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)

> dt[1L, comment := list(1, "a")]       # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two
Run Code Online (Sandbox Code Playgroud)

我知道我试图滥用data.table,例如j参数的设计方式允许这样做:

dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one
Run Code Online (Sandbox Code Playgroud)

问:那么,有没有办法完成我想要的任务?或者我只需要取dt$comment一个变量,修改它,然后每次需要进行更新时重新分配整个列?

Aru*_*run 26

使用:=:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
dt[1L, comment := 1L]

# assign value of 1 and "a" to rows 1 and 2
dt[, comment := list(1, "a")]

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
dt[, comment := list(c("a", "b"), 1)]

# assign list(1, "a") to just 1 row of 'comment'
dt[1L, comment := list(list(list(1, "a")))]
Run Code Online (Sandbox Code Playgroud)

对于最后一种情况,您还需要一个,list因为data.table用于list(.)查找通过引用分配给列的值.

使用set:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
set(dt, i=1L, j="comment", value=1L)

# assign value of 1 and "a" to rows 1 and 2
set(dt, j="comment", value=list(1, "a"))

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
set(dt, j="comment", value=list(c("a", "b"), 1))

# assign list(1, "a") to just 1 row of 'comment'
set(dt, i=1L, j="comment", value=list(list(list(1, "a"))))
Run Code Online (Sandbox Code Playgroud)

HTH


我正在使用当前的开发版本1.9.3,但应该可以在任何其他版本上正常工作.

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3

loaded via a namespace (and not attached):
[1] plyr_1.8.0.99  reshape2_1.2.2 stringr_0.6.2  tools_3.0.3   
Run Code Online (Sandbox Code Playgroud)


Mat*_*wle 14

只是为了添加更多信息,list真正设计的列是每个单元本身是一个vector:

> DT = data.table(a=1:2, b=list(1:5,1:10))
> DT
   a            b
1: 1    1,2,3,4,5
2: 2 1,2,3,4,5,6,

> sapply(DT$b, length)
[1]  5 10 
Run Code Online (Sandbox Code Playgroud)

请注意b列中矢量的漂亮打印.这些逗号只是用于显示,每个单元格实际上是一个向量(如sapply上面的命令所示).另请注意第二项上的尾随逗号b.这表明向量比显示的长(data.table只显示前6项).

或者,更像你的例子:

> DT = data.table(id=1:2, comment=list( c("michele", Sys.time(), "hello"),
                                        c("michele", Sys.time(), "world") ))
> DT
   id                       comment
1:  1 michele,1395330180.9278,hello
2:  2 michele,1395330180.9281,world 
Run Code Online (Sandbox Code Playgroud)

你要做的不仅是有一个list列,而且还要放入list每个单元格,这就是<list>显示的原因.此外,如果您将命名列表放入每个单元格,请注意所有这些名称将占用空间.在可能的情况下,一listvectors可能更容易.