我在使用data.table时遇到问题:如何转换列类?这是一个简单的例子:使用data.frame我没有转换它的问题,data.table我只是不知道如何:
df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])
library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE)
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE])
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, …Run Code Online (Sandbox Code Playgroud) 我试图通过应用共享函数找出一种优雅的方法来使用:=赋值来一次替换多列data.table.这种情况的典型用法可能是将字符串函数(例如gsub)应用于表中的所有字符列.将data.frame这样做的方式扩展到a 并不困难data.table,但我正在寻找一种与data.table做事方式一致的方法.
例如:
library(data.table)
m <- matrix(runif(10000), nrow = 100)
df <- df1 <- df2 <- df3 <- as.data.frame(m)
dt <- as.data.table(df)
head(names(df))
head(names(dt))
## replace V20-V100 with sqrt
# data.frame approach
# by column numbers
df1[20:100] <- lapply(df1[20:100], sqrt)
# by reference to column numbers
v <- 20:100
df2[v] <- lapply(df2[v], sqrt)
# by reference to column names
n <- paste0("V", 20:100)
df3[n] <- lapply(df3[n], sqrt)
# …Run Code Online (Sandbox Code Playgroud) 从我的简单data.table,例如,像这样:
dt1 <- fread("
col1 col2 col3
AAA ab cd
BBB ef gh
BBB ij kl
CCC mn nm")
Run Code Online (Sandbox Code Playgroud)
我正在制作新表,例如,像这样:
dt1[,
.(col3, new=.N),
by=col1]
> col1 col3 new
>1: AAA cd 1
>2: BBB gh 2
>3: BBB kl 2
>4: CCC op 1
Run Code Online (Sandbox Code Playgroud)
当我明确指出列名时,这工作正常.但是当我在变量中使用它们并尝试使用时with=F,会出现错误:
colBy <- 'col1'
colShow <- 'col3'
dt1[,
.(colShow, 'new'=.N),
by=colBy,
with=F]
# Error in `[.data.table`(dt1, , .(colShow, new = .N), by = colBy, with = F) : object 'ansvals' not found
Run Code Online (Sandbox Code Playgroud)
到目前为止,我找不到有关此错误的任何信息.
这类似于kdb中更多(20x)更快的ungroup函数.
我正在寻找一个类似(但速度更快)的函数,假设data.table包含多个列表列,每个列在每行上具有相同数量的元素,将扩展data.table.
这是这篇文章的延伸.
library(data.table)
library(tidyr)
t = Sys.time()
DT = data.table(a=c(1,2,3),
b=c('q','w','e'),
c=list(rep(t,2),rep(t+1,3),rep(t,0)),
d=list(rep(1,2),rep(20,3),rep(1,0)))
print(DT)
a b c d
1: 1 q 2016-01-09 09:55:14,2016-01-09 09:55:14 1,1
2: 2 w 2016-01-09 09:55:15,2016-01-09 09:55:15,2016-01-09 09:55:15 20,20,20
3: 3 e
print(unnest(DT))
Source: local data frame [5 x 4]
a b c d
(dbl) (chr) (time) (dbl)
1 1 q 2016-01-09 09:55:14 1
2 1 q 2016-01-09 09:55:14 1
3 2 w 2016-01-09 09:55:15 20
4 …Run Code Online (Sandbox Code Playgroud) 我希望将数据表的列转换为另一个类,我无法使用字符串引用列.
set.seed(10238)
idt <- data.table(A = rep(1:3, each = 5), B = rep(1:5, 3),
C = sample(15), D = sample(15))
> idt
A B C D
1: 1 1 10 14
2: 1 2 2 2
3: 1 3 13 3
4: 1 4 7 1
5: 1 5 1 8
6: 2 1 11 15
7: 2 2 4 10
8: 2 3 15 7
9: 2 4 14 12
10: 2 5 5 9
11: 3 1 8 …Run Code Online (Sandbox Code Playgroud)