有没有办法在read.table/read.csv中使用colClasses参数时指定Date格式?
(我意识到我可以在导入后进行转换,但是有很多像这样的日期列,在导入步骤中更容易实现)
我有一个.csv格式的日期列%d/%m/%Y.
dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
Run Code Online (Sandbox Code Playgroud)
这会导致转换错误.例如,15/07/2008成为0015-07-20.
data <-
structure(list(func_loc = structure(c(1L, 2L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 5L), .Label = c("3076WAG0003", "3076WAG0004", "3076WAG0007",
"3076WAG0009", "3076WAG0010"), class = "factor"), order_type = structure(c(3L,
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c("PM01", "PM02",
"PM03"), class = "factor"), actual_finish = structure(c(4L, 6L,
1L, 2L, 3L, 7L, 1L, 8L, 1L, 5L), .Label = c("", "11/03/2008",
"14/08/2008", …Run Code Online (Sandbox Code Playgroud) .libPaths()
[1] "\\\\per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
[2] "C:/Program Files/R/R-2.15.2/library"
[3] "C:/Program Files/RStudio/R/library"
Run Code Online (Sandbox Code Playgroud)
第一个项目是我的公司"我的文档"文件夹,从我的姓氏路径中的撇号在R CMD INSTALL --build我正在制作的软件包上使用时会引起各种各样的悲痛,更不用说当我离线时使用安装的软件包的问题来自网络.
我想C:/Program Files/R/R-2.15.2/library用作默认值,但我不想依赖于Rprofile.site.
> .libPaths(.libPaths()[2:3])
> .libPaths()
[1] "C:/Program Files/R/R-2.15.2/library" "C:/Program Files/RStudio/R/library"
Run Code Online (Sandbox Code Playgroud)
这似乎有效,但只有在我重新开始我的R会话之后,我才回到原来的.libPaths()输出......
Restarting R session...
> .libPaths()
[1] "\\\\per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
[2] "C:/Program Files/R/R-2.15.2/library"
[3] "C:/Program Files/RStudio/R/library"
Run Code Online (Sandbox Code Playgroud)
我想也许.libPaths()正在使用R_LIBS_USER
> Sys.getenv("R_LIBS_USER")
[1] "//per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
Run Code Online (Sandbox Code Playgroud)
所以我试图使用Sys.unsetenv("R_LIBS_USER")它来解除它,但它不会在会话之间持续存在.
如果重要,这里有一些可能相关的环境变量......
> Sys.getenv("R_HOME")
[1] "C:/PROGRA~1/R/R-215~1.2"
> Sys.getenv("R_HOME")
[1] "C:/PROGRA~1/R/R-215~1.2"
> Sys.getenv("R_USER")
[1] "//per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell"
> Sys.getenv("R_LIBS_USER")
[1] "//per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
> Sys.getenv("R_LIBS_SITE")
[1] ""
Run Code Online (Sandbox Code Playgroud)
我已经尝试过, …
我正在使用RODBC连接到我使用商业数据库的ODBC驱动程序创建的DSN(OSI Soft的PI Historian Time Series DB,如果你很好奇的话).
> library(RODBC)
> piconn <- odbcConnect("PIRV", uid = "pidemo")
> sqlStr <- "SELECT tag, time, status, value FROM piinterp WHERE tag = 'PW1.PLANT1.PRODUCTION_RATE' and time > DATE('-4h') and timestep = '+2m'"
Run Code Online (Sandbox Code Playgroud)
现在,如果我查询,我得到零行.
> sqlQuery(piconn, sqlStr)
[1] TAG TIME STATUS VALUE
<0 rows> (or 0-length row.names)
Run Code Online (Sandbox Code Playgroud)
有了BelieveNRows = FALSE这些,所有仍然显示零结果,即使它应该返回120行.
> sqlQuery(piconn, sqlStr, believeNRows = FALSE)
> sqlQuery(piconn, sqlStr, believeNRows = FALSE, max = 0)
> sqlQuery(piconn, sqlStr, believeNRows = FALSE, max …Run Code Online (Sandbox Code Playgroud) 我正试图devtools::install_github()在Windows 7上的公司代理后面工作.
到目前为止,我必须做以下事情:
> library(httr)
> library(devtools)
> set_config(use_proxy("123.123.123.123",8080))
> devtools::install_github("rstudio/ggvis")
Installing github repo ggvis/master from rstudio
Downloading master.zip from https://github.com/rstudio/ggvis/archive/master.zip
Error in function (type, msg, asError = TRUE) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Run Code Online (Sandbox Code Playgroud)
显然,我们有一些证书服务器用我们自己的公司SSL证书替换SSL证书(通过访问https://github.com并检查证书确认).
无论如何,只是想知道是否有办法忽略该证书错误并继续安装?
我写了一个查询返回关联Customers和Salespeoeple的行.
请注意,查询会连接多个数据库表.请注意,并非所有客户都有销售人员.
c_id c_name s_id s_name
24 microsoft 1 mike
27 sun 1 mike
42 apple 2 bill
44 oracle 1 mike
47 sgi 1 mike
58 ebay 2 bill
61 paypal 3 joe
65 redhat 1 mike
Run Code Online (Sandbox Code Playgroud)
我的数据库中也有一个表(称为发票),如下所示.
i_id c_id c_name s_id s_name
7208 22 toyota NULL NULL
7209 23 ford NULL NULL
7210 27 sun NULL NULL
7211 42 apple NULL NULL
7212 12 nissan NULL NULL
7213 15 gm NULL NULL
7214 61 …Run Code Online (Sandbox Code Playgroud) 在R中,如何将多行文本文件(包含SQL)的内容导入单个字符串?
sql.txt文件如下所示:
SELECT TOP 100
setpoint,
tph
FROM rates
Run Code Online (Sandbox Code Playgroud)
我需要将该文本文件导入R字符串,使其看起来像这样:
> sqlString
[1] "SELECT TOP 100 setpoint, tph FROM rates"
Run Code Online (Sandbox Code Playgroud)
那就是我可以像这样把它送到RODBC
> library(RODBC)
> myconn<-odbcConnect("RPM")
> results<-sqlQuery(myconn,sqlString)
Run Code Online (Sandbox Code Playgroud)
我已经尝试了如下的readLines命令,但它没有给出RODBC需要的字符串格式.
> filecon<-file("sql.txt","r")
> sqlString<-readLines(filecon, warn=FALSE)
> sqlString
[1] "SELECT TOP 100 " "\t[Reclaim Setpoint Mean (tph)] as setpoint, "
[3] "\t[Reclaim Rate Mean (tph)] as tphmean " "FROM [Dampier_RC1P].[dbo].[Rates]"
>
Run Code Online (Sandbox Code Playgroud) 我一直在努力解决SQL中一些棘手的问题,我需要从事件间隔推断出资产利用率,并且刚刚了解了Allen的Interval Algebra,这似乎是解决这些问题的关键.
代数描述了区间之间的13种关系,下面的图像显示了前七个,其余的是逆的(即x之前的y,y遇到x等)

但我很难找到如何实施相关的操作.
根据我的示例数据,如何从SQL或PLSQL中的以下三种类型的操作中获取结果?
请参阅我的SQLFiddle链接:http://sqlfiddle.com/#!4/cf0cc
start end width
[1] 1 12 12
[2] 8 13 6
[3] 14 19 6
[4] 15 29 15
[5] 19 24 6
[6] 34 35 2
[7] 40 46 7
Run Code Online (Sandbox Code Playgroud)

我想要一个查询来disjoint set从上面的数据中返回,其中所有重叠的间隔都被分成行,这样就不存在重叠.
我该如何处理这个SQL?
start end width
[1] 1 7 7
[2] 8 12 5
[3] 13 13 1
[4] 14 14 1
[5] 15 18 4
[6] 19 19 1 …Run Code Online (Sandbox Code Playgroud) 我想在我的数据集中删除多个列.这是我的数据的样子.
CREATE TABLE T5 (idnum NUMBER,f1 NUMBER(10,5),f2 NUMBER(10,5),f3 NUMBER(10,5)
,e1 NUMBER(10,5),e2 NUMBER(10,5)
,h1 NUMBER(10,5),h2 NUMBER(10,5));
INSERT INTO T5 (IDNUM,F1,F2,F3,E1,E2,H1,H2)
VALUES (1,'10.2004','5.009','7.330','9.008','8.003','.99383','1.43243');
INSERT INTO T5 (IDNUM,F1,F2,F3,E1,E2,H1,H2
VALUES (2,'4.2004','6.009','9.330','4.7008','4.60333','1.993','3.3243');
INSERT INTO T5 (IDNUM,F1,F2,F3,E1,E2,H1,H2)
VALUES (3,'10.2040','52.6009','67.330','9.5008','8.003','.99383','1.43243');
INSERT INTO T5 (IDNUM,F1,F2,F3,E1,E2,H1,H2)
VALUES (4,'9.20704','45.009','17.330','29.008','5.003','3.9583','1.243');
COMMIT;
select * from t5;
IDNUM F1 F2 F3 E1 E2 H1 H2
1 10.2004 5.009 7.33 9.008 8.003 0.99383 1.43243
2 4.2004 6.009 9.33 4.7008 4.60333 1.993 3.3243
3 10.204 52.6009 67.33 9.5008 8.003 0.99383 1.43243
4 9.20704 45.009 17.33 …Run Code Online (Sandbox Code Playgroud) 我只想弄清楚如何在两个data.tables上进行条件连接.


我已经编写了一个sqldf条件连接来给我开始或结束时间在另一个开始/结束时间内的电路.
sqldf("select dt2.start, dt2.finish, dt2.counts, dt1.id, dt1.circuit
from dt2
left join dt1 on (
(dt2.start >= dt1.start and dt2.start < dt1.finish) or
(dt2.finish >= dt1.start and dt2.finish < dt1.finish)
)")
Run Code Online (Sandbox Code Playgroud)

这给了我正确的结果,但对于我的大数据集来说它太慢了.
data.table没有矢量扫描,这样做的方法是什么?
这是我的数据:
dt1 <- data.table(structure(list(circuit = structure(c(2L, 1L, 2L, 1L, 2L, 3L,
1L, 1L, 2L), .Label = c("a", "b", "c"), class = "factor"), start = structure(c(1393621200,
1393627920, 1393628400, 1393631520, 1393650300, 1393646400, 1393656000,
1393668000, 1393666200), class = c("POSIXct", "POSIXt"), tzone = ""),
end = structure(c(1393626600, 1393631519, …Run Code Online (Sandbox Code Playgroud)