替代标题可能是“使用变异中的滞后来引用先前的行变异”
\n我想包含为前一行生成的值作为变异计算的输入。一些数据:
\nmydiamonds <- diamonds %>%\n mutate(Ideal = ifelse(cut == 'Ideal', 1, 0)) %>% \n group_by(Ideal) %>% \n mutate(rn = row_number()) %>% \n arrange(Ideal, rn) %>% \n mutate(CumPrice = cumsum(price)) %>% \n mutate(InitialPrice = min(price)) %>% \n select(Ideal, rn, CumPrice, InitialPrice)\nRun Code Online (Sandbox Code Playgroud)\n看起来像这样:
\nmydiamonds %>% head\n# A tibble: 6 x 4\n# Groups: Ideal [1]\n Ideal rn CumPrice InitialPrice\n <dbl> <int> <int> <int>\n1 0 1 326 326\n2 0 2 653 326\n3 0 3 987 326\n4 0 4 1322 326\n5 0 5 1658 326\n6 0 6 1994 326\nRun Code Online (Sandbox Code Playgroud)\n一个模型:
\nmod.diamonds = glm(CumPrice ~ log(lag(CumPrice)) +log(rn) + Ideal , family = "poisson", data = mydiamonds)\nRun Code Online (Sandbox Code Playgroud)\n测试模型:
\n# new data, pretend we don't know CumPrice but want to use predictions to predict subsequent predictions\nmydiamonds.testdata <- mydiamonds %>% select(-CumPrice)\n# manual prediction based on lag(prediction), for the first row in each group use InitialPrice\n## add coefficients as fields\ncoeffs <- mod.diamonds$coefficients\nmydiamonds.testdata <- mydiamonds.testdata %>% \n mutate(CoefIntercept = coeffs['(Intercept)'],\n CoefLogLagCumPrice = coeffs['log(lag(CumPrice))'],\n CoefLogRn = coeffs['log(rn)'],\n CoefIdeal = coeffs['Ideal']\n )\nRun Code Online (Sandbox Code Playgroud)\n这是我的测试数据的样子:
\n mydiamonds.testdata %>% head\n# A tibble: 6 x 7\n# Groups: Ideal [1]\n Ideal rn InitialPrice CoefIntercept CoefLogLagCumPrice CoefLogRn CoefIdeal\n <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>\n1 0 1 326 0.0931 0.987 0.0154 -0.000715\n2 0 2 326 0.0931 0.987 0.0154 -0.000715\n3 0 3 326 0.0931 0.987 0.0154 -0.000715\n4 0 4 326 0.0931 0.987 0.0154 -0.000715\n5 0 5 326 0.0931 0.987 0.0154 -0.000715\n6 0 6 326 0.0931 0.987 0.0154 -0.000715\nRun Code Online (Sandbox Code Playgroud)\n无法使用 Predict(),因为我需要递归地预测前一天/行的预测输入到当天的位置。相反,尝试使用系数进行手动预测:
\n# prediction\nmydiamonds.testdata <- mydiamonds.testdata %>% \n mutate(\n Prediction = CoefIntercept + \n \n # here's the hard bit. If it's the first row in the group, use InitialPrice, else use the value of the previous prediction\n (CoefLogLagCumPrice * ifelse(rn == 1, InitialPrice, lag(Prediction))) + \n \n (CoefLogRn * log(rn)) + \n (CoefIdeal * Ideal)\n )\nRun Code Online (Sandbox Code Playgroud)\n\n\n错误:
\nmutate()输入有问题Prediction。x 对象\n未找到“预测”\xe2\x84\xb9 输入Prediction为+...。\xe2\x84\xb9 组 1 中发生错误:理想 = 0。
我怎样才能以这种方式变异,我想引用前面的行变异?(除非它是第一行,在这种情况下使用 InitialPrice)
\n[编辑]根据评论者,我尝试了累积,这是一个我不太熟悉的函数:
\nmydiamonds.testdata <- mydiamonds.testdata %>% \n mutate(\n Prediction = accumulate(.f = function(.) {\n \n .$CoefIntercept + \n \n # here's the hard bit. If it's the first row in the group, use InitialPrice, else use the value of the previous prediction\n (.$CoefLogLagCumPrice * ifelse(.$rn == 1, .$InitialPrice, lag(.$Prediction))) + \n \n (.$CoefLogRn * log(.$rn)) + \n (.$CoefIdeal * .$Ideal)\n \n }))\nError: Problem with `mutate()` input `Prediction`.\nx argument ".x" is missing, with no default\n\xe2\x84\xb9 Input `Prediction` is `accumulate(...)`.\n\xe2\x84\xb9 The error occurred in group 1: Ideal = 0.\nRun Code Online (Sandbox Code Playgroud)\n
正如您所说,您不习惯这个相当复杂的功能,这里有一些解释。
purrr::accumulate()用于计算逐行递归运算。它的第一个参数.x是您想要累积的变量。它的第二个参数.f是一个应该有 2 个参数的函数:当前结果cur和下一个评估值val。第一次.f被调用时,cur等于.x[1](默认情况下),然后它等于之前返回的结果.f。
purrr::accumulate2()允许我们使用第二个变量.y进行迭代。第一个值.y总是被忽略,因为.f此时已经知道要返回什么。因此,.y应该比 短一项.x。
不幸的是,只有accumulate()和accumulate2()是您需要accumulate3()或paccumulate()积累的 rn、理想和价格。
但是,通过使用row_number()和cur_data(),您可以欺骗accumulate2()以按照您的意愿行事:
CoefIntercept = coeffs['(Intercept)']
CoefLogLagCumPrice = coeffs['log(lag(CumPrice))']
CoefLogRn = coeffs['log(rn)']
CoefIdeal = coeffs['Ideal']
mydiamonds.testdata <- mydiamonds %>%
ungroup() %>%
select(-CumPrice) %>%
mutate(
Prediction = accumulate2(.x=InitialPrice, .y=row_number()[-1],
.f=function(acc, nxt, row) {
db=cur_data_all()
rn = db$rn[row]
Ideal = db$Ideal[row]
CoefIntercept +
(CoefLogLagCumPrice * acc) +
(CoefLogRn * log(rn)) +
(CoefIdeal * Ideal)
}) %>% unlist()
)
mydiamonds.testdata
# A tibble: 53,940 x 4
# Ideal rn InitialPrice Prediction
# <dbl> <int> <int> <dbl>
# 1 0 1 326 326
# 2 0 2 326 322.
# 3 0 3 326 318.
# 4 0 4 326 313.
# 5 0 5 326 309.
# 6 0 6 326 305.
# 7 0 7 326 301.
# 8 0 8 326 297.
# 9 0 9 326 294.
# 10 0 10 326 290.
Run Code Online (Sandbox Code Playgroud)
编辑:还有另一种更清晰的使用.init参数的方法,因为InitialPrice除了第一个值之外,该列从未真正使用过。这允许直接使用参数,但它不适用于具有更多协变量的更复杂模型。
mydiamonds.testdata <- mydiamonds %>%
ungroup() %>%
select(-CumPrice) %>%
mutate(
Prediction = accumulate2(.x=Ideal[-1], .y=rn[-1],
.init=InitialPrice[1],
.f=function(rslt, Ideal, rn) {
CoefIntercept +
(CoefLogLagCumPrice * rslt) +
(CoefLogRn * log(rn)) +
(CoefIdeal * Ideal)
}) %>% unlist()
)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
693 次 |
| 最近记录: |