我有这个虚拟数据集:
abc <- data.table(a = c("NA", "bc", "x"), b = c(1, 2, 3), c = c("n", "NA", "NA"))
Run Code Online (Sandbox Code Playgroud)
我试图用标准NA替换"NA"; 使用data.table到位.我试过了:
for(i in names(abc)) (abc[which(abc[[i]] == "NA"), i := NA])
for(i in names(abc)) (abc[which(abc[[i]] == "NA"), i := NA_character_])
for(i in names(abc)) (set(abc, which(abc[[i]] == "NA"), i, NA))
Run Code Online (Sandbox Code Playgroud)
但仍然有这个我得到:
abc$a
"NA" "bc" "x"
Run Code Online (Sandbox Code Playgroud)
我错过了什么?
编辑:我在这个问题中尝试了@frank的答案type.convert().(谢谢坦率;不知道这种模糊但有用的功能)在文档中type.convert()提到:"这主要是read.table的辅助函数." 所以我想彻底测试一下.当您有一个填充"NA"(NA字符串)的完整列时,此功能会产生较小的副作用.在这种情况下type.convert(),将列转换为逻辑列.对于这种情况abc将是:
abc <- data.table(a = c("NA", "bc", "x"), b = c(1, 2, 3), c = c("n", "NA", "NA"), d = c("NA", "NA", "NA"))
Run Code Online (Sandbox Code Playgroud)
EDIT2:总结原始问题中的代码:
for(i in names(abc)) (set(abc, which(abc[[i]] == "NA"), i, NA))
Run Code Online (Sandbox Code Playgroud)
工作正常,但仅限于当前最新版本data.table(> 1.11.4).因此,如果一个人面临这个问题,那么更好地更新data.table并使用这个代码比type.convert()
我会做...
chcols = names(abc)[sapply(abc, is.character)]
abc[, (chcols) := lapply(.SD, type.convert, as.is=TRUE), .SDcols=chcols]
Run Code Online (Sandbox Code Playgroud)
产量
> str(abc)
Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
$ a: chr NA "bc" "x"
$ b: num 1 2 3
$ c: chr "n" NA NA
- attr(*, ".internal.selfref")=<externalptr>
Run Code Online (Sandbox Code Playgroud)
您的DT[, i :=]代码无效,因为它创建了一个名为"i"的列; 正如@AdamSampson指出的那样,你的set代码已经运行了.(注意:在他们的comp之前,OP从data.table 1.10.4-3升级到1.11.4.)
所以我想彻底测试一下.当您有一个填充"NA"(NA字符串)的完整列时,此功能会产生较小的副作用.在这种情况下
type.convert(),将列转换为逻辑列.
啊对.您最初的方法可以更安全地解决此问题:
# op's new example
abc <- data.table(a = c("NA", "bc", "x"), b = c(1, 2, 3), c = c("n", "NA", "NA"), d = c("NA", "NA", "NA"))
# op's original code
for(i in names(abc))
set(abc, which(abc[[i]] == "NA"), i, NA)
Run Code Online (Sandbox Code Playgroud)
附注:NA具有逻辑类型; 并且通常data.table会在将不一致类型的值分配给列时发出警告,但我猜他们在NAs的例外中写道:
DT = data.table(x = 1:2)
DT[1, x := NA]
# no problem, even though x is int and NA is logi
DT = data.table(x = 1:2)
DT[1, x := TRUE]
# Warning message:
# In `[.data.table`(DT, 1, `:=`(x, TRUE)) :
# Coerced 'logical' RHS to 'integer' to match the column's type. Either change the target column ['x'] to 'logical' first (by creating a new 'logical' vector length 2 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'integer' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
175 次 |
| 最近记录: |