调整列的元素以使 cumsum 等于 0

Question

调整列的元素以使 cumsum 等于 0

我在更大的数据集中有这些列（这里我只报告资产“x”，但有不同的资产，因此想法是为每个资产复制该过程）：

df <- structure(list(
        asset = c("x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x")
        col1 =  c(10, 10, -22, 11, -13, 15, -7, -10, 10, -5, 3),  
        cumsum(col1) = c(10, 20, -2, 9, -4, 11, 4, -6, 4, -1, 2), 
        class = "data.frame", row.names = c(NA, -11L)
     )

Run Code Online (Sandbox Code Playgroud)

我想纠正 col1 中的负数，使 cumsum(col1) 等于

cumsum(col1) = c(10, 20, 0, 11, 0, 15, 8, 0, 10, 5, 8)

Run Code Online (Sandbox Code Playgroud)

为了得到这个结果，我需要纠正 col1 数字当且仅当负数大于前一个数字的总和。例如，-22第三个位置的应该变为-20与前一个的 cumsum 相匹配10+10 ，然后-13应该等于-11并且-10应该变为-8，而最后三个数字不应该改变，因为它们不会累积到负结果。

所以在修正过程结束时我应该得到

col1 = c(10, 10, -20, 11, -11, 15, -7, -8, 10, -5, 3)
cumsum(col1) = c(10, 20, 0, 11, 0 ,15, 8, 0, 10, 5, 8)

Run Code Online (Sandbox Code Playgroud)

在修正的过程中我认为机制应该是（我不知道如何用R来做，但我在理论上得到了一些东西）：

group_by = col1 中的每个组应由每个大于其前一行的 cumsum 的 col1(row) 定义，并在 col1(row) 大于先前元素 cumsum 时重新开始
iff col1(row) 大于前一个 cumsum，则用前面带负号的组 cumsum 数修正 col1(row)
cumsum col1 并再次检查当且仅当结果与所需输出匹配时，因此不应有负 cumsum 值。最小值应等于 0

在原始数据集中，我有多种资产类型，因此不仅有“x”，还有“y”、“z”等。此外，我需要group_by投资者，因为同样的情况也适用于 4k 投资者。因此真实的数据集是这样的：

df <- structure(list(
        investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
        asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
        col1 =  c(10, 10, -22, 11, -13, 15, 9, -10, 10, -5, 3),  
        cumsum(col1) = c(10, 20, -2, 11, -2, 13, 9, -1, 10, 5, 3), 
        class = "data.frame", row.names = c(NA, -11L)
     )

Run Code Online (Sandbox Code Playgroud)

我需要它成为的地方（代码应该只处理group_by(investor, asset)）

df <- structure(list(
        investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
        asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
        col1 =  c(10, 10, -20, 11, -11, 15, 9, -9, 10, -5, 3),  
        cumsum(col1) = c(10, 20, 0, 11, 0, 15, 9, 0, 10, 5, 3), 
        class = "data.frame", row.names = c(NA, -11L)
     )

Run Code Online (Sandbox Code Playgroud)

我写了关于解决方案的思考，dplyr因为我对此更满意，但我不知道是否可以在 dplyr 中完成。

谢谢您的帮助！

Answer 1

akr*_*run 1

我们可以这样做accumulate

\n

library(dplyr)\nlibrary(purrr)\ndf %>% \n   group_by(asset) %>%\n   mutate(col2csum = accumulate(col1,  ~ if(abs(.x + .y) < abs(.y)) 0 else \n       .x + .y)) %>% \n   ungroup\n

Run Code Online (Sandbox Code Playgroud)\n

-输出

\n

# A tibble: 11 \xc3\x97 3\n   asset  col1 col2csum\n   <chr> <dbl>    <dbl>\n 1 x        10       10\n 2 x        10       20\n 3 x       -22        0\n 4 x        11       11\n 5 x       -13        0\n 6 x        15       15\n 7 x        -7        8\n 8 x       -10        0\n 9 x        10       10\n10 x        -5        5\n11 x         3        8\n

Run Code Online (Sandbox Code Playgroud)\n

更新

\n

如果我们想改变'col1'

\n

df %>% \n   group_by(asset) %>%\n   mutate(col2csum = accumulate(col1,  ~ if(abs(.x + .y) < abs(.y)) 0 else \n       .x + .y), col1 = c(first(col2csum), diff(col2csum))) %>% ungroup\n

Run Code Online (Sandbox Code Playgroud)\n

-输出

\n

# A tibble: 11 \xc3\x97 3\n   asset  col1 col2csum\n   <chr> <dbl>    <dbl>\n 1 x        10       10\n 2 x        10       20\n 3 x       -20        0\n 4 x        11       11\n 5 x       -11        0\n 6 x        15       15\n 7 x        -7        8\n 8 x        -8        0\n 9 x        10       10\n10 x        -5        5\n11 x         3        8\n

Run Code Online (Sandbox Code Playgroud)\n

数据

\n

df <- structure(list(asset = c("x", "x", "x", "x", "x", "x", "x", "x", \n"x", "x", "x"), col1 = c(10, 10, -22, 11, -13, 15, -7, -10, 10, \n-5, 3)), class = "data.frame", row.names = c(NA, -11L))\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	3 年，5 月前
查看次数：	84 次
最近记录：	3 年，5 月前