调整列的元素以使 cumsum 等于 0

Lor*_*lli 5 r cumsum dplyr group

我在更大的数据集中有这些列(这里我只报告资产“x”,但有不同的资产,因此想法是为每个资产复制该过程):

df <- structure(list(
        asset = c("x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x")
        col1 =  c(10, 10, -22, 11, -13, 15, -7, -10, 10, -5, 3),  
        cumsum(col1) = c(10, 20, -2, 9, -4, 11, 4, -6, 4, -1, 2), 
        class = "data.frame", row.names = c(NA, -11L)
     )
Run Code Online (Sandbox Code Playgroud)

我想纠正 col1 中的负数,使 cumsum(col1) 等于

cumsum(col1) = c(10, 20, 0, 11, 0, 15, 8, 0, 10, 5, 8)
Run Code Online (Sandbox Code Playgroud)

为了得到这个结果,我需要纠正 col1 数字当且仅当负数大于前一个数字的总和。例如,-22第三个位置的 应该变为-20与前一个的 cumsum 相匹配10+10 ,然后-13应该等于-11并且-10应该变为-8,而最后三个数字不应该改变,因为它们不会累积到负结果。

所以在修正过程结束时我应该得到

col1 = c(10, 10, -20, 11, -11, 15, -7, -8, 10, -5, 3)
cumsum(col1) = c(10, 20, 0, 11, 0 ,15, 8, 0, 10, 5, 8)
Run Code Online (Sandbox Code Playgroud)

在修正的过程中我认为机制应该是(我不知道如何用R来做,但我在理论上得到了一些东西):

  • group_by = col1 中的每个组应由每个大于其前一行的 cumsum 的 col1(row) 定义,并在 col1(row) 大于先前元素 cumsum 时重新开始

  • iff col1(row) 大于前一个 cumsum,则用前面带负号的组 cumsum 数修正 col1(row)

  • cumsum col1 并再次检查当且仅当结果与所需输出匹配时,因此不应有负 cumsum 值。最小值应等于 0

在原始数据集中,我有多种资产类型,因此不仅有“x”,还有“y”、“z”等。此外,我需要group_by投资者,因为同样的情况也适用于 4k 投资者。因此真实的数据集是这样的:

df <- structure(list(
        investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
        asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
        col1 =  c(10, 10, -22, 11, -13, 15, 9, -10, 10, -5, 3),  
        cumsum(col1) = c(10, 20, -2, 11, -2, 13, 9, -1, 10, 5, 3), 
        class = "data.frame", row.names = c(NA, -11L)
     )
Run Code Online (Sandbox Code Playgroud)

我需要它成为的地方(代码应该只处理group_by(investor, asset)

df <- structure(list(
        investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
        asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
        col1 =  c(10, 10, -20, 11, -11, 15, 9, -9, 10, -5, 3),  
        cumsum(col1) = c(10, 20, 0, 11, 0, 15, 9, 0, 10, 5, 3), 
        class = "data.frame", row.names = c(NA, -11L)
     )
Run Code Online (Sandbox Code Playgroud)

我写了关于解决方案的思考,dplyr因为我对此更满意,但我不知道是否可以在 dplyr 中完成。

谢谢您的帮助!

akr*_*run 1

我们可以这样做accumulate

\n
library(dplyr)\nlibrary(purrr)\ndf %>% \n   group_by(asset) %>%\n   mutate(col2csum = accumulate(col1,  ~ if(abs(.x + .y) < abs(.y)) 0 else \n       .x + .y)) %>% \n   ungroup\n
Run Code Online (Sandbox Code Playgroud)\n

-输出

\n
# A tibble: 11 \xc3\x97 3\n   asset  col1 col2csum\n   <chr> <dbl>    <dbl>\n 1 x        10       10\n 2 x        10       20\n 3 x       -22        0\n 4 x        11       11\n 5 x       -13        0\n 6 x        15       15\n 7 x        -7        8\n 8 x       -10        0\n 9 x        10       10\n10 x        -5        5\n11 x         3        8\n
Run Code Online (Sandbox Code Playgroud)\n

更新

\n

如果我们想改变'col1'

\n
df %>% \n   group_by(asset) %>%\n   mutate(col2csum = accumulate(col1,  ~ if(abs(.x + .y) < abs(.y)) 0 else \n       .x + .y), col1 = c(first(col2csum), diff(col2csum))) %>% ungroup\n
Run Code Online (Sandbox Code Playgroud)\n

-输出

\n
# A tibble: 11 \xc3\x97 3\n   asset  col1 col2csum\n   <chr> <dbl>    <dbl>\n 1 x        10       10\n 2 x        10       20\n 3 x       -20        0\n 4 x        11       11\n 5 x       -11        0\n 6 x        15       15\n 7 x        -7        8\n 8 x        -8        0\n 9 x        10       10\n10 x        -5        5\n11 x         3        8\n
Run Code Online (Sandbox Code Playgroud)\n

数据

\n
df <- structure(list(asset = c("x", "x", "x", "x", "x", "x", "x", "x", \n"x", "x", "x"), col1 = c(10, 10, -22, 11, -13, 15, -7, -10, 10, \n-5, 3)), class = "data.frame", row.names = c(NA, -11L))\n
Run Code Online (Sandbox Code Playgroud)\n