我正在stats::unirootdata.table 中的一百万行上运行函数。这是一个玩具示例 -
library(data.table)
cumhaz <- function(t, a, b) b * (t/b)^a
froot <- function(x, u, a, b) cumhaz(x, a, b) - u
n <- 50000
u <- -log(runif(n))
a <- 1/2
b <- 1
dt = data.table(u = u, a = a, b = b)
print(system.time(
dt[, c := uniroot(froot, u=u, a=a, b=b, interval= c(0.01, 10), extendInt="yes")$root, by = u]
))
Run Code Online (Sandbox Code Playgroud)
在上面的代码中,50,000 行所花费的时间接近 8 秒。
有没有更快的替代函数uniroot可以大大减少这个时间?
在过去的 5 天里,我试图让 Keras/Tensorflow 包在 R 中工作。我使用 RStudio 进行安装并使用了conda, miniconda,virtualenv但最终每次都会崩溃。安装库不应该是一场噩梦,尤其是当我们谈论 R(最好的统计语言之一)和 TensorFlow(最好的深度学习库之一)时。有人可以分享在 CentOS 7 上安装 Keras/Tensorflow 的可靠方法吗?
以下是我在 RStudio 中安装的步骤tensorflow。
由于 RStudio 每次运行时都会崩溃,因此tensorflow::tf_config()我无法检查出了什么问题。
devtools::install_github("rstudio/reticulate")
devtools::install_github("rstudio/keras") # This package also installs tensorflow
library(reticulate)
reticulate::install_miniconda()
reticulate::use_miniconda("r-reticulate")
library(tensorflow)
tensorflow::tf_config() **# Crashes at this point**
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 …Run Code Online (Sandbox Code Playgroud) 我正在努力将上一组的价值转移到下一组。我尝试使用它来解决它rleid,但无法获得所需的结果。
df <- data.frame(signal = c(1,1,5,5,5,2,3,3,3,4,4,5,5,5,5,6,7,7,8,9,9,9,10),
desired_outcome = c(NA, NA, 1, 1, 1, 5, 2, 2, 2, 3, 3, 4, 4,4,4,5,6,6,7,8,8,8,9))
# outcome column has the expected result -
signal desired_outcome
1 1 NA
2 1 NA
3 5 1
4 5 1
5 5 1
6 2 5
7 3 2
8 3 2
9 3 2
10 4 3
11 4 3
12 5 4
13 5 4
14 5 4
15 5 4
16 6 …Run Code Online (Sandbox Code Playgroud) 我有一个简单的 data.table 如下 -
ID = c(rep("A", 1000), rep("B", 1000), rep("C", 1000), rep("D", 1000))
val = c("a", "a", "a", "b", "b", "c", "c","d","d","d","d","e","e","f","f","g","g","g","g","g")
dt = data.table(ID, val)
Run Code Online (Sandbox Code Playgroud)
val我想向此 data.table 添加一个新列,该列将具有by group的滞后ID。
这是预期的输出
> head(dt, 20)
ID val val_lag
1: A a <NA>
2: A a <NA>
3: A a <NA>
4: A b a
5: A b a
6: A c b
7: A c b
8: A d c
9: A d c
10: A d …Run Code Online (Sandbox Code Playgroud) 我有一个 data.table,我想向其中添加倒计时,直到列中出现值 1 flag。
dt = structure(list(date = structure(19309:19318, class = c("IDate",
"Date")), flag = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 1)), class = c("data.table",
"data.frame"), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x55af7de49cb0>)
> dt
date flag
1: 2022-11-13 0
2: 2022-11-14 0
3: 2022-11-15 0
4: 2022-11-16 0
5: 2022-11-17 0
6: 2022-11-18 1
7: 2022-11-19 0
8: 2022-11-20 0
9: 2022-11-21 0
10: 2022-11-22 1
Run Code Online (Sandbox Code Playgroud)
这是预期的输出
date flag countdown
1: 2022-11-13 0 5 …Run Code Online (Sandbox Code Playgroud) 在下面的 data.table 中,我想按每个组标记第一行。
temp_dt <- data.table(date = as.Date(c("2000-01-01","2000-03-31","2000-07-01","2000-09-30",
"2001-01-01","2001-03-31","2001-07-01","2001-09-30",
"2000-01-01","2000-03-31","2000-07-01","2000-09-30",
"2001-01-01","2001-03-31","2001-07-01","2001-09-30",
"2000-01-01","2000-03-31","2000-07-01","2000-09-30",
"2001-01-01","2001-03-31","2001-07-01","2001-09-30")),
group = c(1,1,1,1,1,1,1,1,
2,2,6,6,6,8,8,8,
3,3,3,3,4,4,4,4))
Run Code Online (Sandbox Code Playgroud)
以下是添加标志后的预期结果。
> temp_dt
date group flag
1: 2000-01-01 1 1
2: 2000-03-31 1 0
3: 2000-07-01 1 0
4: 2000-09-30 1 0
5: 2001-01-01 1 0
6: 2001-03-31 1 0
7: 2001-07-01 1 0
8: 2001-09-30 1 0
9: 2000-01-01 2 1
10: 2000-03-31 2 0
11: 2000-07-01 6 1
12: 2000-09-30 6 0
13: 2001-01-01 6 0
14: 2001-03-31 8 1 …Run Code Online (Sandbox Code Playgroud)