假设我有两个数据框,学生和老师.
students <- data.frame(name = c("John", "Mary", "Sue", "Mark", "Gordy", "Joey", "Marge", "Sheev", "Lisa"),
height = c(111, 93, 99, 107, 100, 123, 104, 80, 95),
smart = c("no", "no", "yes", "no", "yes", "yes", "no", "yes", "no"))
teachers <- data.frame(name = c("Ben", "Craig", "Mindy"),
height = c(130, 101, 105),
smart = c("yes", "yes", "yes"))
Run Code Online (Sandbox Code Playgroud)
我希望生成所有可能的学生和教师组合并保留随附的信息,基本上创建数据框"学生"和"教师"的所有行组合.这可以通过循环和cbind轻松完成,但对于大型数据帧,这需要永远.帮助一个R新手 - 最快的方法是什么?
编辑:如果这不清楚,我希望输出具有以下格式:
rbind(
cbind(students[1, ], teachers[1, ]),
cbind(students[1, ], teachers[2, ])
...
cbind(students[n, ], teachers[n, ]))
Run Code Online (Sandbox Code Playgroud) 使用 sklearn 构建模型相对较新。我知道交叉验证可以通过 n_jobs 参数进行并行化,但是如果我不使用 CV,我该如何利用我的可用内核来加速模型拟合?
根据其文档,xgboost具有n_jobs参数。但是,当我尝试设置n_jobs时,出现此错误:
TypeError: __init__() got an unexpected keyword argument 'n_jobs'
Run Code Online (Sandbox Code Playgroud)
对于其他一些参数(例如random_state)也存在同样的问题。我以为这可能是一个更新问题,但似乎我具有最新版本(0.6a2,随pip一起安装)。
对于我来说,重现该错误并不需要太多:
from xgboost import XGBClassifier
estimator_xGBM = XGBClassifier(max_depth = 5, learning_rate = 0.05, n_estimators = 400, n_jobs = -1).fit(x_train)
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?
我已经在 R 中使用 ARIMA 进行了一些时间序列预测,它在给定一系列连续值的情况下预测未来时间点的值,但我不确定在处理分类值时如何进行时间序列预测。
鉴于 5 个人的晨衣程序的这些简单训练序列,我如何为 person6 的最后两个条目生成预测?
person1 <- c("underwear", "socks", "pants", "shirt", "tie", "shoes", "jacket")
person2 <- c("underwear", "pants", "socks", "shirt", "tie", "jacket", "shoes")
person3 <- c("socks", "underwear", "pants", "shirt", "tie", "shoes", "jacket")
person4 <- c("underwear", "socks", "shirt", "pants", "tie", "shoes", "jacket")
person5 <- c("underwear", "socks", "shirt", "tie", "pants", "jacket", "shoes")
person6 <- c("underwear", "socks", "pants", "shirt") # Predict next events
Run Code Online (Sandbox Code Playgroud)
提前致谢!
如果我有这个数据:
df1 <- data.frame(name = c("apple", "apple", "apple", "orange", "orange"),
ID = c(1, 2, 3, 4, 5),
is_fruit = c("yes", "yes", "yes", "yes", "yes"))
Run Code Online (Sandbox Code Playgroud)
我只想保留唯一的行,但忽略该ID列,输出如下所示:
df2 <- data.frame(name = c("apple", "orange"),
ID = c(1, 4),
is_fruit = c("yes", "yes"))
df2
# name ID is_fruit
#1 apple 1 yes
#2 orange 4 yes
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点,理想情况下dplyr?