我想在绘制一些预测时改变x轴.该模型每日通过crontab更新.x比例必须包括日期的每日增加:
示例数据:
# where dates changes according to Sys.Date-1
dates <- seq(as.Date("2015-01-01"), Sys.Date()-1, by = "days")
# where x is updated daily
x <- diffinv(rnorm(length(dates)-1))
df<-data.frame(dates,x)
# split data and train model
df$x<-as.ts(df$x)
# required libraries
library(caret)
library(forecast)
library(plyr)
# the time series is updated on daily basis
date1 <- strptime("2016-02-04", format="%Y-%m-%d")
date2 <- strptime(Sys.time()-1, format="%Y-%m-%d")
date3<-difftime(date2,date1,units="days")
# here I split data into time and test data according to initialWindow "2016-02-04"
timeSlices <- createTimeSlices(1:nrow(df),
initialWindow = 400, horizon = date3, …
Run Code Online (Sandbox Code Playgroud) 我是聚类和R的学生.为了获得更好的两者的抓地力,我想计算每次迭代的质心和我的xy矩阵之间的距离,直到它"收敛".如何使用R解决第2步和第3步?
library(fields)
x <- c(3,6,8,1,2,2,6,6,7,7,8,8)
y <- c(5,2,3,5,4,6,1,8,3,6,1,7)
df <- data.frame(x,y) initial matrix
a <- c(3,6,8)
b <- c(5,2,3)
df1 <- data.frame(a,b) # initial centroids
Run Code Online (Sandbox Code Playgroud)
这是我想要做的:
I0 <- t(rdist(df, df1))
零迭代后我试过这个kmeans
功能.但由于某些原因,它会产生那些必须在最后出现的质心.那是我定义的开始:
start <- matrix(c(3,5,6,2,8,3), 3, byrow = TRUE)
cluster <- kmeans(df,centers = start, iter.max = 1) # one iteration
Run Code Online (Sandbox Code Playgroud)
kmeans
不允许我跟踪质心的运动.因此,我想通过使用R应用步骤2和3来"手动"执行此操作.
我最近下载googlesheets
通过
devtools::install_github("jennybc/googlesheets")
Run Code Online (Sandbox Code Playgroud)
并遇到一些困难.当运行https://github.com/jennybc/googlesheets中提到的脚本时, 我总是得到:
Error: could not find function "%>%"
Run Code Online (Sandbox Code Playgroud)
我该如何解决这个问题?
可重复的例子:
下载:
devtools::install_github("jennybc/googlesheets")
require(googlesheets)
Run Code Online (Sandbox Code Playgroud)
数据:
gap_key <- "1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA"
copy_ss(key = gap_key, to = "Gapminder")
gap <- register_ss("Gapminder")
Run Code Online (Sandbox Code Playgroud)
发生错误:
oceania_csv <- gap %>% get_via_csv(ws = "Oceania")
Run Code Online (Sandbox Code Playgroud) 最初,我试图devtools::install_github("EdwinTh/padr")
在出现以下错误的地方使用:
Error in curl::new_handle() : An unknown option was passed in to libcurl
一段时间后,我发现,devtools
使用curl/RCurl
该包装的当前版本curl
的主机上。
主机(Ubuntu 14.04.5 LTS)机器版本curl
是:
1$ curl -V
curl 7.61.0 (x86_64-pc-linux-gnu) libcurl/7.61.0 OpenSSL/1.0.1f zlib/1.2.8 libssh2/1.8.0 librtmp/2.3
其中curl
位于:
12$ which curl
/usr/local/bin/curl
并且在curl::new_handle()
直接从终端使用时工作正常
相比之下,curl/RCurl
版本是:
> RCurl::curlVersion()
$age
[1] 3
$version
[1] "7.35.0"`
Run Code Online (Sandbox Code Playgroud)
我假设这可能是潜在的问题。我现在的问题是如何让Rscurl/RCurl
指向对应的版本。
我已经尝试了迄今为止在该主题上找到的所有内容,R
例如:
install.packages("RCurl", type="source")
和
install.packages("curl", type="source")
以及curl
在主机上进行调整:
wget https://libssh2.org/download/libssh2-1.8.0.tar.gz
tar zxvf llibssh2-1.8.0.tar.gz
cd libssh2-1.8.0 …
Run Code Online (Sandbox Code Playgroud) 我有年月日格式的日期,我想转换为年月周格式,如下所示:
date dateweek
2015-02-18 -> 2015-02-8
2015-02-19 -> 2015-02-8
2015-02-20 -> ....
2015-02-21
2015-02-22
2015-02-23
2015-02-24 ...
2015-02-25 -> 2015-02-9
2015-02-26 -> 2015-02-9
2015-02-27 -> 2015-02-9
Run Code Online (Sandbox Code Playgroud)
我试过了
data$dateweek <- week(as.POSIXlt(data$date))
Run Code Online (Sandbox Code Playgroud)
但是只返回没有相应年份和月份的几周.
我也尝试过:
data$dateweek <- as.POSIXct('2015-02-18')
data$dateweek <- format(data$dateweek, '%Y-%m-%U')
# data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')
Run Code Online (Sandbox Code Playgroud)
但相应的列看起来很奇怪:
date datetime
2015-01-01 2015-01-00
2015-01-02 2015-01-00
2015-01-03 2015-01-00
2015-01-04 2015-01-01
2015-01-05 2015-01-01
2015-01-06 2015-01-01
2015-01-07 2015-01-01
2015-01-08 2015-01-01
2015-01-09 2015-01-01
2015-01-10 2015-01-01
2015-01-11 2015-01-02
Run Code Online (Sandbox Code Playgroud) 我的一位同事的文件夹/目录中有很多sql
陈述。该文件夹也由他每天更新。我想sql
为期货同事记录这些声明。但是,我正在寻找一种使该过程“自动化”的方法。我想crontab
每周使用一次并运行一个R-Markdown
文件,该文件会自动更新现有R-Markdown
文件。
我的方法如下:
path = "c:/SQL_files/"
out.file<-""
file.names <- dir(path, pattern =".sql") # here I changed `.txt` to `.sql`
for(i in 1:length(file.names)){
file <- read.csv2.sql(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
out.file <- rbind(out.file, file)
}
# That second approach comes very close, but just generates a `.txt` for the first
#`.sql` file in the directory with the error:
Error in match.names(clabs, names(xi)) :
names do not match previous names
Run Code Online (Sandbox Code Playgroud)
文件在哪里:
[1] "c:/SQL_files/first.sql"
[2] "c:/SQL_files/second.sql" …
Run Code Online (Sandbox Code Playgroud) 我有一个由以下样式组成的字符向量:
mylist <- c('John Myer Stewert','Steve',' Michael Boris',' Daniel and Frieds','Michael-Myer')
Run Code Online (Sandbox Code Playgroud)
我正在尝试创建一个像这样的字符向量:
mylist <- c('John+Myer+Stewert','Steve',' Michael+Boris',' Daniel+and+Frieds','Michael+Myer')
Run Code Online (Sandbox Code Playgroud)
我试过了:
test <- cat(paste(shQuote(mylist , type="cmd"), collapse="+"))
Run Code Online (Sandbox Code Playgroud)
这似乎是错的.如何更改上面的单词分隔符mylist
?
很抱歉因为编码问题打扰您。花了几个小时没有得到解决方案,我决定将其发布在这里。我试图使用Ubuntu 14.04 中的, 编写一个简单的表write.table
,但没有成功。由于 cronjob,我的数据有点混乱:write.csv
write.csv2
ID <- c("",30,26,20,30,40,5,10,4)
b <- c("",2233,12,2,22,13,23,23,100)
c <- c("","","","","","","","","")
d <- c("","","","","","","","","")
e <- c("","","","","","800","","","")
f <- c("","","","","","","","","")
g <- c("","","","","","","","EA","")
h <- c("","","","","","","","","")
df <- data.frame(ID,b,c,d,e,f,g,h)
# change columns to chr
for(i in c(1,2:ncol(df))) {
df[,i] <- as.character(df[,i])
}
str(df)
# data.frame': 9 obs. of 8 variables:
# $ ID: chr "" "30" "26" "20" ...
# $ b : chr "" "2233" "12" "2" ...
# $ c : …
Run Code Online (Sandbox Code Playgroud) 我想留下连接面板数据,因为缺少一些观察结果.但是,我无法做到这一点并保留面板结构:
数据:
# package I'm using
library(dplyr)
date <- as.Date(as.character(c("2015-02-13",
"2015-02-14",
"2015-02-16",
"2015-02-17",
"2015-02-14",
"2015-02-16",
"2015-02-13",
"2015-02-14",
"2015-02-17")))
b <-c("John","John","John","John","Michael","Michael","Thomas","Thomas","Thomas")
c <- c(20,30,26,20,30,40,5,10,4)
d <- c(11,2233,12,2,22,13,23,23,100)
# put together
df <- data.frame(b, dates,c,d)
df
b dates c d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-16 26 12
#4 John 2015-02-17 20 2
#5 Michael 2015-02-14 30 22
#6 Michael 2015-02-16 40 13
#7 Thomas 2015-02-13 5 23
#8 Thomas 2015-02-14 10 23
#9 Thomas …
Run Code Online (Sandbox Code Playgroud) 我想将空格(没有值)更改为missing(NA
).我假设当R读取数据(在我的情况下是csv)时会自动发生这种情况,但是只有空白所以我试过了:
is.na(data) <- data==""
Run Code Online (Sandbox Code Playgroud)
我也尝试过:
data <- read.table("data.csv", header=TRUE, sep=";", na.strings="")
data[data==""] <- NA
Run Code Online (Sandbox Code Playgroud)
但空白仍然存在.我怎么解决这个问题?
鉴于:
set.seed(1001)
outcome<-rnorm(1000,sd = 1)
covariate<-rnorm(1000,sd = 1)
Run Code Online (Sandbox Code Playgroud)
正常pdf的对数似然:
loglike <- function(par, outcome, covariate){
cov <- as.matrix(cbind(1, covariate))
xb <- cov * par
(- 1/2* sum((outcome - xb)^2))
}
Run Code Online (Sandbox Code Playgroud)
优化:
opt.normal <- optim(par = 0.1,fn = loglike,outcome=outcome,cov=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
Run Code Online (Sandbox Code Playgroud)
但是,在运行简单的OLS时,我会得到不同的结果.然而,最大化log-likelihhod和最小化OLS应该让我得到类似的估计.我想我的优化有问题.
summary(lm(outcome~covariate))
Run Code Online (Sandbox Code Playgroud)