R:使用数据表来聚合数据

Mik*_*ike 3 aggregate r data.table

我是新手使用数据表,想要一些帮助聚合一些数据.

Login   OpenTime            CloseTime     OpenedValueUSD    ClosedValueUSD  Year    Month   TransferredValue Identifier
859    04/02/2014 07:55 05/02/2014 15:37    10000               10000       2014    2             0                1
859    07/02/2014 03:16 07/02/2014 03:51    8960.755            8960.755    2014    2             0                2
859    11/02/2014 12:41 13/02/2014 11:56    13635.178           13606.901   2014    2             0                3
859    11/02/2014 13:34 11/02/2014 15:34    13635.178           13635.178   2014    2             13635.178        4
859    12/02/2014 13:46 14/02/2014 09:59    13660.246           13649.278   2014    2             13635.178        5
859    13/02/2014 15:33 13/02/2014 15:42    13606.901           13606.901   2014    2             13660.246        6
859    25/03/2014 14:52 26/03/2014 12:58    10000               10000       2014    3             0                7
Run Code Online (Sandbox Code Playgroud)

对于每一行,我想汇总在该交易之前开立的所有交易,并在该交易开启后关闭.例如,第三行的交易在第四次交易之前开盘,但仅在第四次交易开盘后关闭.因此,我接着使用OpenedValueUSD进行交易(以及任何其他适当的交易(在本例中为无))并将其放在TransferredValue列中.

这是当前代码:

tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime < 
           tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login == 
           tradeData$Login]), by="Identifier"]
Run Code Online (Sandbox Code Playgroud)

Aru*_*run 7

这是另一种使用foverlaps()不需要逐行分组的方法.我会打电话给你的data.table dt.

  1. 转换OpenTimeCloseTime对POSIXct格式,如图@ alex23lemm.

  2. 添加一个tmpTime等于的临时列OpenTime.我们将使用它foverlaps().

    dt[, tmpTime := OpenTime]
    
    Run Code Online (Sandbox Code Playgroud)
  3. setkey()Login, OpenTime, CloseTimecolums上.

    setkey(dt, Login, OpenTime, CloseTime)
    
    Run Code Online (Sandbox Code Playgroud)
  4. 使用foverlaps(),我们现在将获得完全Login, OpenTime, tmpTime秋天的间隔. Login, OpenTime, CloseTime

    olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"), 
                    which=TRUE, nomatch=0L, type="within")
    
    Run Code Online (Sandbox Code Playgroud)

    by.y 自动被视为关键列.

  5. 删除自我重叠,即删除那些xid == yid.

    olaps = olaps[xid != yid]
    #    xid yid
    # 1:   4   3
    # 2:   5   3
    # 3:   6   5
    
    Run Code Online (Sandbox Code Playgroud)
  6. xid行对应的值分配给行yid.并删除tmpTime.

    dt[olaps$xid, TransferredValue := 
            dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL]
    
    #    Login            OpenTime           CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
    # 1:   859 2014-02-04 07:55:00 2014-02-05 15:37:00      10000.000      10000.000 2014     2             0.00          1
    # 2:   859 2014-02-07 03:16:00 2014-02-07 03:51:00       8960.755       8960.755 2014     2             0.00          2
    # 3:   859 2014-02-11 12:41:00 2014-02-13 11:56:00      13635.178      13606.901 2014     2             0.00          3
    # 4:   859 2014-02-11 13:34:00 2014-02-11 15:34:00      13635.178      13635.178 2014     2         13635.18          4
    # 5:   859 2014-02-12 13:46:00 2014-02-14 09:59:00      13660.246      13649.278 2014     2         13635.18          5
    # 6:   859 2014-02-13 15:33:00 2014-02-13 15:42:00      13606.901      13606.901 2014     2         13660.25          6
    # 7:   859 2014-03-25 14:52:00 2014-03-26 12:58:00      10000.000      10000.000 2014     3             0.00          7
    
    Run Code Online (Sandbox Code Playgroud)