为什么rbindlist不尊重列名?

Jam*_*mes 7 r rbind data.table

我刚发现这个bug,却发现有些人称之为"功能".这让rbindlist不喜欢do.call("rbind",l)rbind会尊重列名.此外,文档中没有提到这种完全出乎意料的行为.这真的是故意的吗?

代码示例:

> library(data.table)
> DT1 <- data.table(a=1, b=2)
> DT2 <- data.table(b=3, a=4)
> DT1
a b
1: 1 2
> DT2
b a
1: 3 4
Run Code Online (Sandbox Code Playgroud)

我希望rbind这些会生成a = 1,4的列; b = 2,3.并获得与rbind.data.tablerbind.data.frame,虽然rbind.data.table产生警告.

> rbind(DT1, DT2)
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
> rbind(as.data.frame(DT1), as.data.frame(DT2))
a b
1 1 2
2 4 3
> do.call('rbind', list(DT1, DT2))
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
Run Code Online (Sandbox Code Playgroud)

rbindlist但是,很乐意默默地破坏数据:

> rbindlist(list(DT1, DT2))
a b
1: 1 2
2: 3 4
Run Code Online (Sandbox Code Playgroud)

Aru*_*run 7

此功能现在在v1.9.3的commit 1266中实现.来自新闻:

o  'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
   entirely in C. Closes #5249    
  -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
     names by default)
  -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
     is TRUE by default, for compatibility with base (and backwards compatibility).
  -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
  -> At least one item of the input list has to have non-null column names.
  -> Duplicate columns are bound in the order of occurrence, like base.
  -> Attributes that might exist in individual items would be lost in the bound result.
  -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
  -> And incredibly fast ;).
  -> Documentation updated in much detail. Closes DR #5158.
Run Code Online (Sandbox Code Playgroud)

有了这个,您可以设置use.names=TRUE为按名称绑定.FALSE默认情况下,它设置为向后兼容性.或者,您可以使用rbind(..)where use.names=TRUE,再次向后兼容.

有关更多示例和此帖子的基准测试,请参阅此帖子.

例子:

1)刚设置 use.names=TRUE

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)

rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
#    x y
# 1: 1 2
# 2: 2 1

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(z=2, y=1)

# returns error when fill=FALSE but can't be bound without fill=TRUE
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : 
    # Answer requires 3 columns whereas one or more item(s) in the input 
    # list has only 2 columns. ...
Run Code Online (Sandbox Code Playgroud)

2)还按发生顺序绑定重复的列名:

DT1 <- data.table(x=1, x=2, y=10, y=20, y=30)
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30)

rbindlist(list(DT1,DT2), use.names=TRUE)

#     x  x   y   y   y
# 1:  1  2  10  20  30
# 2: -2 -1 -10 -20 -30
Run Code Online (Sandbox Code Playgroud)

3)fill=TRUE如果要按名称绑定并填充缺少的列,请使用

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=2, z=-1)

rbindlist(list(DT1, DT2), fill=TRUE)
#     x y  z
# 1:  1 2 NA
# 2: NA 2 -1
Run Code Online (Sandbox Code Playgroud)

HTH