这是我不理解的东西data.table
如果我选择一行,我尝试将此行的所有值设置NA为新行 - data.table被转换为逻辑
#Here is a sample table
DT <- data.table(a=rep(1L,3),b=rep(1.1,3),d=rep('aa',3))
DT
a b d
1: 1 1.1 aa
2: 1 1.1 aa
3: 1 1.1 aa
#Here I extract a line, all the column types are kept... good
str(DT[1])
Classes ‘data.table’ and 'data.frame': 1 obs. of 3 variables:
$ a: int 1
$ b: num 1.1
$ d: chr "aa"
- attr(*, ".internal.selfref")=<externalptr>
#Now here I want to set them all to NA...they all become …Run Code Online (Sandbox Code Playgroud) 是否可以在keep数据步骤中使用通配符?我想做以下事情(A保持变量x和y以及所有变量以a开头的A的左连接):
data C;
merge A(in=a)
B(keep= x y var* in=b);
by x y;
if a;
run;
Run Code Online (Sandbox Code Playgroud) 在这篇文章之后,我有另外一个关于列表列的问题data.table.
DT = data.table(x=list(c(1,2),c(1,2),c(3,4,5)))
Run Code Online (Sandbox Code Playgroud)
看来你无法键入一列列表.
DT[,y:=.I,by=x]
Erreur dans `[.data.table`(DT, , `:=`(y, .I), by = x) :
The items in the 'by' or 'keyby' list are length (2,2,3). Each must be same length as rows in x or number of rows returned by i (3).
Run Code Online (Sandbox Code Playgroud)
我以为我可以使用相同长度的列表但是:
DT = data.table(x=list(c(1,2),c(1,2),c(3,5)))
DT[,y:=.I,by=x]
Erreur dans `[.data.table`(DT, , `:=`(y, .I), by = x) :
The items in the 'by' or 'keyby' list are length (2,2,2). Each must be same length as …Run Code Online (Sandbox Code Playgroud) 我无法弄清楚如何执行以下操作,从列表列中创建动态数量的列 data.table
set.seed(123); N=1e5
DT = data.table(x=rnorm(N), y=sample(c('a','b','c'),N,T))
probs = seq(.1,1,.1); newCols <- paste("q",100*probs,sep="");
DT2 <- DT[ ,list(Q=list(quantile(x,probs=probs))),by=y]
DT2
# y Q
#1: b -1.2817037351734,-0.840293441466144,-0.525195748246148,-0.259574774974136,
#2: c -1.26975023312311,-0.832359658553173,-0.513320691339448,-0.247863323660894,
#3: a -1.28189935066568,-0.838918942382995,-0.522409189372727,-0.257356179072232,
#Here I want to create 10 columns from Q called q10, q20...
DT2[ , newCols:=Q] #can't make this work because it is evaluated in the wrong environment I guess
Run Code Online (Sandbox Code Playgroud) 我很惊讶这样做,使用data.table包:
a = as.ITime('12:01:00')
str(a)
Class 'ITime' int 4326
a = as.ITime(c('12:01:00','12:00:02'))
Message d'avis :
In if (!is.na(y)) return(as.ITime(y)) :
la condition a une longueur > 1 et seul le premier élément est utilisé
str(a)
Class 'ITime' int [1:2] 43260 43202
Run Code Online (Sandbox Code Playgroud)
为什么这条线会发出警告?
我有一个关于滚动连接的评论/问题
让 X,Y 是:
set.seed(123);
X <- data.table(x=c(1,1,1,2,2),y=c(T,T,F,F,F),t=as.POSIXct("08:00:00.000",format="%H:%M:%OS")+sample(0:999,5,TRUE)/1e3)
Y <- copy(X)
set.seed(123)
Y[,`:=`(IDX=.I,t=t+sample(c(-5:5)/1e3,5,T))]
Y <- rbindlist(list(Y, X[5,][,IDX:=6][,t:=t+0.001], X[5,][,IDX:=7][,t:=t+0.002]))
setkey(X,x,y,t)
setkey(Y,x,y,t)
Run Code Online (Sandbox Code Playgroud)
这里X并按Y以下顺序排序x,y,t
R) X
x y t
1: 1 FALSE 2013-06-20 08:00:00.407
2: 1 TRUE 2013-06-20 08:00:00.286
3: 1 TRUE 2013-06-20 08:00:00.788
4: 2 FALSE 2013-06-20 08:00:00.882
5: 2 FALSE 2013-06-20 08:00:00.940
R) Y
x y t IDX
1: 1 FALSE 2013-06-20 08:00:00.407 3
2: 1 TRUE 2013-06-20 08:00:00.284 1
3: 1 TRUE 2013-06-20 08:00:00.791 2 …Run Code Online (Sandbox Code Playgroud) 我想在plotly图表中添加一条线(不仅仅是垂直或水平)
library(plotly)
d = data.frame(x=1:10, y=1:10)
plot_ly(d, x = ~x, y = ~y, type='scatter')
Run Code Online (Sandbox Code Playgroud)
假设我想要(D) y = 2x一条线,有没有一种方法可以让我在不自己在另一列中生成数据的情况下进行绘图d
又一个重塑问题了 data.table
set.seed(1234)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
# x y v
# 1: 1 A 12
# 2: 1 B 62
...
#11: 3 A 63
#12: 3 B 49
Run Code Online (Sandbox Code Playgroud)
我想这样做的累加值x和v通过y,但结果呈现为:行数始终保持不变,而当y==A在SUM.*.A递增,同一时候y==B.(像往常一样y可能有很多因素,本例中为2)
# SUM.x.A SUM.x.B SUM.v.A SUM.v.B
# 1: 1 NA 12 NA
# 2: 1 1 12 62
...
#11: 12 9 318 289
#12: 12 12 318 338
Run Code Online (Sandbox Code Playgroud)
编辑:这是我的糟糕解决方案显然过于复杂
#first step is to create …Run Code Online (Sandbox Code Playgroud) 我试图加载integer64如character在fread ?fread表示该integer64说法没有实现,但options(datatable.integer64)为.虽然fread一直在加载int64.
如何判断fread加载为character.编辑 [如果colClasses是答案,我认为它不允许指定单个列名称或索引,并且我加载的表有几十列,所以不可行...... =>这是错误的]
这是一个例子
#for int 64
library(bit64)
#for fast everything
library(data.table)
#here is a sample
df <- structure(list(IDFD = structure(c(5.13878419797985e-299, 5.13878419797985e-299,
+ 5.13878419797985e-299, 5.13878419797987e-299, 5.13878419797987e-299,
+ 5.13878419797987e-299, 5.13878419797987e-299, 5.13878419797987e-299,
+ 5.13878419797988e-299, 5.13878419797988e-299), class = "integer64")), .Names = "IDFD", row.names = c(NA,
+ -10L), class = c("data.table", "data.frame"))
#write the sample to file
write.csv(df,"test.csv",quote=F,row.names=F)
#I can't load it as …Run Code Online (Sandbox Code Playgroud) 您好,我正在生成报告,rmarkdown
我决定使用ggplot2图表,因为它似乎knitr rmarkdown ggplot2可以更好地协同工作。
我想在html_notebook文档rmarkdown::render'ed 中全局增加我的 ggplot2 绘图的轴标签、刻度标签、标题。
我可以在yaml全局块选项中或通过在全局块选项中设置某些内容来执行此操作吗?
r ×9
data.table ×7
ggplot2 ×1
knitr ×1
long-integer ×1
plot ×1
plotly ×1
r-markdown ×1
read.csv ×1
reshape ×1
sas ×1