Chr*_*h_J 113 r data.table
我在使用data.table时遇到问题:如何转换列类?这是一个简单的例子:使用data.frame我没有转换它的问题,data.table我只是不知道如何:
df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])
library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE)
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE])
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") :
#unused argument(s) (with = FALSE)
Run Code Online (Sandbox Code Playgroud)
我想念一些明显的东西吗?
由于Matthew的帖子更新:之前我使用过旧版本,但即使更新到1.6.6(我现在使用的版本)之后,我仍然会收到错误消息.
更新2:假设我想将类"factor"的每一列转换为"character"列,但事先不知道哪个列属于哪个类.使用data.frame,我可以执行以下操作:
classes <- as.character(sapply(df, class))
colClasses <- which(classes=="factor")
df[, colClasses] <- sapply(df[, colClasses], as.character)
Run Code Online (Sandbox Code Playgroud)
我可以用data.table做类似的事情吗?
更新3:
sessionInfo()R版本2.13.1(2011-07-08)平台:x86_64-pc-mingw32/x64(64位)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.6
loaded via a namespace (and not attached):
[1] tools_2.13.1
Run Code Online (Sandbox Code Playgroud)
And*_*rie 93
对于单个列:
dtnew <- dt[, Quarter:=as.character(Quarter)]
str(dtnew)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : num -0.838 0.146 -1.059 -1.197 0.282 ...
Run Code Online (Sandbox Code Playgroud)
使用lapply
和as.character
:
dtnew <- dt[, lapply(.SD, as.character), by=ID]
str(dtnew)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : chr "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ...
Run Code Online (Sandbox Code Playgroud)
Ner*_*era 44
试试这个
DT <- data.table(X1 = c("a", "b"), X2 = c(1,2), X3 = c("hello", "you"))
changeCols <- colnames(DT)[which(as.vector(DT[,lapply(.SD, class)]) == "character")]
DT[,(changeCols):= lapply(.SD, as.factor), .SDcols = changeCols]
Run Code Online (Sandbox Code Playgroud)
提出马特·道尔对 Geneorama 的回答 ( /sf/answers/1456626181/ )的评论以使其更明显(鼓励),您可以使用for(...)set(...)
.
library(data.table)
DT = data.table(a = LETTERS[c(3L,1:3)], b = 4:7, c = letters[1:4])
DT1 <- copy(DT)
names_factors <- c("a", "c")
for(col in names_factors)
set(DT, j = col, value = as.factor(DT[[col]]))
sapply(DT, class)
#> a b c
#> "factor" "integer" "factor"
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v0.3.0)于 2020 年 2 月 12 日创建
有关更多信息,请参阅/sf/answers/2310054491/上 Matt 的另一条评论。
编辑。
正如 Espen 和 in 所指出的help(set)
,j
可能是“列名(字符)或数字(整数)要在列已经存在时分配值”。所以 names_factors <- c(1L, 3L)
也会起作用。
归档时间: |
|
查看次数: |
90330 次 |
最近记录: |