我有一个df包含600多个变量的data.frame .我正在编写一个自动创建列的函数,需要对它们进行一次视觉检查.
该str功能提供了一个很好的总结:
str(df)
'data.frame': 29 obs. of 602 variables:
$ uniqueSessionsIni: POSIXct, format: "2015-01-05 15:00:00" "2015-01-05 16:00:00" "2015-01-05 17:00:00" ...
$ uniqueSessionsEnd: POSIXct, format: "2015-01-05 15:59:00" "2015-01-05 16:59:00" "2015-01-05 17:59:00" ...
$ m0p0 : POSIXct, format: "2015-01-05 15:00:00" "2015-01-05 15:00:00" "2015-01-05 15:00:00" ...
$ m1p0 : POSIXct, format: "2015-01-05 15:01:00" "2015-01-05 15:01:00" "2015-01-05 15:01:00" ...
$ m2p0 : POSIXct, format: "2015-01-05 15:02:00" "2015-01-05 15:02:00" "2015-01-05 15:02:00" ...
Run Code Online (Sandbox Code Playgroud)
它继续...
但截断输出,如下所示:
$ m33p1 : POSIXct, format: "2015-01-05 …Run Code Online (Sandbox Code Playgroud) 想象一下数据框如下面的df1:
df1 <- data.frame(v1 = as.factor(c("m0p1", "m5p30", "m11p20", "m59p60", "m59p60")))
Run Code Online (Sandbox Code Playgroud)
如何创建变量所有级别的列表?谢谢.
我可以成功使用foverlaps我的数据集的一小部分样本,但是当使用完整数据(data.tables超过30k行)时,它会崩溃并抛出以下错误:
错误信息:
Error in if (any(x[[xintervals[2L]]] - x[[xintervals[1L]]] < 0L)) stop("All entries in column ", :
missing value where TRUE/FALSE needed
Run Code Online (Sandbox Code Playgroud)
我解释错误消息的方式是两个data.tables之间没有重叠.
Q1-Am我能很好地解释这个消息吗?
Q2 - 任何想法为什么这可能发生在更大的数据集上?这可能是由于数据集的大小造成的吗?
我确实有很多独特的值,根据foverlaps帮助文件,可以预期会按比例减慢速度,但不会在它进入数百万行之前,这远非如此.谢谢.
我试图在谷歌地图上绘制一个线层。
数据
> dput(map)
new("SpatialLinesDataFrame"
, data = structure(list(att = c(463643, 2291491, 315237340, 10348934,
309845150, 674351, 58057, 55962, 302861, 1405635)), .Names = "att", row.names = c(NA,
10L), class = "data.frame")
, lines = list(<S4 object of class structure("Lines", package = "sp")>,
<S4 object of class structure("Lines", package = "sp")>,
<S4 object of class structure("Lines", package = "sp")>,
<S4 object of class structure("Lines", package = "sp")>,
<S4 object of class structure("Lines", package = "sp")>,
<S4 object of class structure("Lines", package = …Run Code Online (Sandbox Code Playgroud) 数据
v1 <- c("2015-01-05 15:00:00", "2015-01-05 15:45:00", "2015-01-05 15:00:30")
Run Code Online (Sandbox Code Playgroud)
营运
v2 <- strptime(v1, '%Y-%m-%d %H:%M:%S')
str(v2)
POSIXlt[1:3], format: "2015-01-05 15:00:00" "2015-01-05 15:45:00" "2015-01-05 15:00:30"
v3 <- v2[!v2$min] # create v3 from v2 eliminating min != 00
Run Code Online (Sandbox Code Playgroud)
结果(成功的子集)
str(v3)
POSIXlt[1:2], format: "2015-01-05 15:00:00" "2015-01-05 15:00:30"
Run Code Online (Sandbox Code Playgroud)
现在通过将v2强制转换为POSIXct来创建v4(成功)
v4 <- as.POSIXct(v2, format = "%y/%m/%d %H:%M")
str(v4)
POSIXct[1:3], format: "2015-01-05 15:00:00" "2015-01-05 15:45:00" "2015-01-05 15:00:30"
Run Code Online (Sandbox Code Playgroud)
问题中的操作 - 对POSIXct应用与POSIXlt相同的子集操作会导致下面的错误
v5 <- v4[!v4$min] # reassign v2 eliminating min != 00
Run Code Online (Sandbox Code Playgroud)
结果(错误)
Error in v4$min : $ …Run Code Online (Sandbox Code Playgroud)