Jam*_*mes 7 r rbind data.table
我刚发现这个bug,却发现有些人称之为"功能".这让rbindlist不喜欢do.call("rbind",l)的rbind会尊重列名.此外,文档中没有提到这种完全出乎意料的行为.这真的是故意的吗?
代码示例:
> library(data.table)
> DT1 <- data.table(a=1, b=2)
> DT2 <- data.table(b=3, a=4)
> DT1
a b
1: 1 2
> DT2
b a
1: 3 4
Run Code Online (Sandbox Code Playgroud)
我希望rbind这些会生成a = 1,4的列; b = 2,3.并获得与rbind.data.table和rbind.data.frame,虽然rbind.data.table产生警告.
> rbind(DT1, DT2)
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
> rbind(as.data.frame(DT1), as.data.frame(DT2))
a b
1 1 2
2 4 3
> do.call('rbind', list(DT1, DT2))
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
Run Code Online (Sandbox Code Playgroud)
rbindlist但是,很乐意默默地破坏数据:
> rbindlist(list(DT1, DT2))
a b
1: 1 2
2: 3 4
Run Code Online (Sandbox Code Playgroud)
o 'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented
entirely in C. Closes #5249
-> use.names by default is FALSE for backwards compatibility (doesn't bind by
names by default)
-> rbind(...) now just calls rbindlist() internally, except that 'use.names'
is TRUE by default, for compatibility with base (and backwards compatibility).
-> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
-> At least one item of the input list has to have non-null column names.
-> Duplicate columns are bound in the order of occurrence, like base.
-> Attributes that might exist in individual items would be lost in the bound result.
-> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
-> And incredibly fast ;).
-> Documentation updated in much detail. Closes DR #5158.
Run Code Online (Sandbox Code Playgroud)
有了这个,您可以设置use.names=TRUE为按名称绑定.FALSE默认情况下,它设置为向后兼容性.或者,您可以使用rbind(..)where use.names=TRUE,再次向后兼容.
1)刚设置 use.names=TRUE
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)
rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
# x y
# 1: 1 2
# 2: 2 1
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(z=2, y=1)
# returns error when fill=FALSE but can't be bound without fill=TRUE
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) :
# Answer requires 3 columns whereas one or more item(s) in the input
# list has only 2 columns. ...
Run Code Online (Sandbox Code Playgroud)
2)还按发生顺序绑定重复的列名:
DT1 <- data.table(x=1, x=2, y=10, y=20, y=30)
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30)
rbindlist(list(DT1,DT2), use.names=TRUE)
# x x y y y
# 1: 1 2 10 20 30
# 2: -2 -1 -10 -20 -30
Run Code Online (Sandbox Code Playgroud)
3)fill=TRUE如果要按名称绑定并填充缺少的列,请使用
DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=2, z=-1)
rbindlist(list(DT1, DT2), fill=TRUE)
# x y z
# 1: 1 2 NA
# 2: NA 2 -1
Run Code Online (Sandbox Code Playgroud)
HTH
| 归档时间: |
|
| 查看次数: |
9318 次 |
| 最近记录: |