Lor*_*lli 5 r cumsum dplyr group
我在更大的数据集中有这些列(这里我只报告资产“x”,但有不同的资产,因此想法是为每个资产复制该过程):
df <- structure(list(
asset = c("x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x")
col1 = c(10, 10, -22, 11, -13, 15, -7, -10, 10, -5, 3),
cumsum(col1) = c(10, 20, -2, 9, -4, 11, 4, -6, 4, -1, 2),
class = "data.frame", row.names = c(NA, -11L)
)
Run Code Online (Sandbox Code Playgroud)
我想纠正 col1 中的负数,使 cumsum(col1) 等于
cumsum(col1) = c(10, 20, 0, 11, 0, 15, 8, 0, 10, 5, 8)
Run Code Online (Sandbox Code Playgroud)
为了得到这个结果,我需要纠正 col1 数字当且仅当负数大于前一个数字的总和。例如,-22第三个位置的 应该变为-20与前一个的 cumsum 相匹配10+10
,然后-13应该等于-11并且-10应该变为-8,而最后三个数字不应该改变,因为它们不会累积到负结果。
所以在修正过程结束时我应该得到
col1 = c(10, 10, -20, 11, -11, 15, -7, -8, 10, -5, 3)
cumsum(col1) = c(10, 20, 0, 11, 0 ,15, 8, 0, 10, 5, 8)
Run Code Online (Sandbox Code Playgroud)
在修正的过程中我认为机制应该是(我不知道如何用R来做,但我在理论上得到了一些东西):
group_by = col1 中的每个组应由每个大于其前一行的 cumsum 的 col1(row) 定义,并在 col1(row) 大于先前元素 cumsum 时重新开始
iff col1(row) 大于前一个 cumsum,则用前面带负号的组 cumsum 数修正 col1(row)
cumsum col1 并再次检查当且仅当结果与所需输出匹配时,因此不应有负 cumsum 值。最小值应等于 0
在原始数据集中,我有多种资产类型,因此不仅有“x”,还有“y”、“z”等。此外,我需要group_by投资者,因为同样的情况也适用于 4k 投资者。因此真实的数据集是这样的:
df <- structure(list(
investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
col1 = c(10, 10, -22, 11, -13, 15, 9, -10, 10, -5, 3),
cumsum(col1) = c(10, 20, -2, 11, -2, 13, 9, -1, 10, 5, 3),
class = "data.frame", row.names = c(NA, -11L)
)
Run Code Online (Sandbox Code Playgroud)
我需要它成为的地方(代码应该只处理group_by(investor, asset))
df <- structure(list(
investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
col1 = c(10, 10, -20, 11, -11, 15, 9, -9, 10, -5, 3),
cumsum(col1) = c(10, 20, 0, 11, 0, 15, 9, 0, 10, 5, 3),
class = "data.frame", row.names = c(NA, -11L)
)
Run Code Online (Sandbox Code Playgroud)
我写了关于解决方案的思考,dplyr因为我对此更满意,但我不知道是否可以在 dplyr 中完成。
谢谢您的帮助!
我们可以这样做accumulate
library(dplyr)\nlibrary(purrr)\ndf %>% \n group_by(asset) %>%\n mutate(col2csum = accumulate(col1, ~ if(abs(.x + .y) < abs(.y)) 0 else \n .x + .y)) %>% \n ungroup\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n# A tibble: 11 \xc3\x97 3\n asset col1 col2csum\n <chr> <dbl> <dbl>\n 1 x 10 10\n 2 x 10 20\n 3 x -22 0\n 4 x 11 11\n 5 x -13 0\n 6 x 15 15\n 7 x -7 8\n 8 x -10 0\n 9 x 10 10\n10 x -5 5\n11 x 3 8\nRun Code Online (Sandbox Code Playgroud)\n如果我们想改变'col1'
\ndf %>% \n group_by(asset) %>%\n mutate(col2csum = accumulate(col1, ~ if(abs(.x + .y) < abs(.y)) 0 else \n .x + .y), col1 = c(first(col2csum), diff(col2csum))) %>% ungroup\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n# A tibble: 11 \xc3\x97 3\n asset col1 col2csum\n <chr> <dbl> <dbl>\n 1 x 10 10\n 2 x 10 20\n 3 x -20 0\n 4 x 11 11\n 5 x -11 0\n 6 x 15 15\n 7 x -7 8\n 8 x -8 0\n 9 x 10 10\n10 x -5 5\n11 x 3 8\nRun Code Online (Sandbox Code Playgroud)\ndf <- structure(list(asset = c("x", "x", "x", "x", "x", "x", "x", "x", \n"x", "x", "x"), col1 = c(10, 10, -22, 11, -13, 15, -7, -10, 10, \n-5, 3)), class = "data.frame", row.names = c(NA, -11L))\nRun Code Online (Sandbox Code Playgroud)\n