连接2个数据表同时汇总其中一个数据的最快方法

Fin*_*ino 4 r data.table

问题

我有2个数据表

运动:

  library(data.table)

  consEx = data.table(
  begin = as.POSIXct(c("2019-04-01 00:00:10"," 2019-04-07 10:00:00","2019-04-10 23:00:00","2019-04-12 20:00:00","2019-04-15 10:00:00",
                       "2019-04-20 10:00:00","2019-04-22 13:30:00","2019-04-10 15:30:00","2019-04-12 21:30:00","2019-04-15 20:00:00")),

  end = as.POSIXct(c("2019-04-01 20:00:00","2019-04-07 15:00:00","2019-04-11 10:00:00", "2019-04-12 23:30:00","2019-04-15 15:00:00",
                     "2019-04-21 12:00:00","2019-04-22 17:30:00","2019-04-10 20:00:00","2019-04-13 05:00:00", "2019-04-15 12:30:00")),

  carId = c(1,1,1,2,2,3,3,4,4,5),
  tripId = c(1:10)
)
Run Code Online (Sandbox Code Playgroud)

和警报:

alertsEx = data.table(
  timestamp = as.POSIXct(c("2019-04-01 10:00:00","2019-04-01 10:30:00","2019-04-01 15:00:00","2019-04-15 13:00:00","2019-04-22 14:00:00",
                "2019-04-22 15:10:00","2019-04-22 15:40:00","2019-04-10 16:00:00","2019-04-10 17:00:00","2019-04-13 04:00:00")),
  type = c("T1","T2","T1",'T3',"T1","T1","T3","T2","T2","T1"),
  carId = c(1,1,1,2,3,3,3,4,4,4),
  additionalInfo1 = rnorm(10,mean=10,sd=4)
)
Run Code Online (Sandbox Code Playgroud)

运动表记录了一个周期begin- end汽车在该周期内运动。警报表显示在警报发生在汽车,包含typetimestampcarId

我需要加入这2个表,并通过汇总警报数据type。当在同一时期内有多个相同类型的警报时,我需要additionalInfo1用平均值来进行汇总。

当前方法

我目前正在通过遍历consEx并将函数应用于每一行的方式来执行此操作,该函数返回包含所需计算的列表

findAlerts = function(begin,end,carId_2){
  saida = alertsEx[timestamp >= begin & timestamp <= end & carId == carId_2,]
  totals = nrow(saida)

  saida = split(saida,by="type")

  resultsList = list(
    "totals" = 0,
    "t1Count" = 0,
    "t1Mean" = 0,
    "t2Count" = 0,
    "t2Mean" = 0,
    "t3Count" = 0,
    "t3Mean" = 0)
  resultsList[["totals"]] = totals

  types = names(saida)
  if("T1" %in% types){
    resultsList[["t1Count"]] = nrow(saida[["T1"]])
    resultsList[["t1Mean"]] = mean(saida[["T1"]]$additionalInfo1)
  }

  if("T2" %in% types){
    resultsList[["t2Count"]] = nrow(saida[["T2"]])
    resultsList[["t2Mean"]] = mean(saida[["T2"]]$additionalInfo1)
  }

  if("T3" %in% types){
    resultsList[["t3Count"]] = nrow(saida[["T3"]])
    resultsList[["t3Mean"]] = mean(saida[["T3"]]$additionalInfo1)
  }

  return(resultsList)
}

for(i in 1:nrow(consEx)){
  aux = findAlerts(consEx$begin[i],consEx$end[i],consEx$carId[i])
  consEx[i,names(aux) := aux]

}
Run Code Online (Sandbox Code Playgroud)

预期产量

给出预期的输出:

                 begin                 end carId tripId totals t1Count    t1Mean t2Count    t2Mean t3Count    t3Mean
 1: 2019-04-01 00:00:10 2019-04-01 20:00:00     1      1      3       2 10.123463       1 14.479288       0  0.000000
 2: 2019-04-07 10:00:00 2019-04-07 15:00:00     1      2      0       0  0.000000       0  0.000000       0  0.000000
 3: 2019-04-10 23:00:00 2019-04-11 10:00:00     1      3      0       0  0.000000       0  0.000000       0  0.000000
 4: 2019-04-12 20:00:00 2019-04-12 23:30:00     2      4      0       0  0.000000       0  0.000000       0  0.000000
 5: 2019-04-15 10:00:00 2019-04-15 15:00:00     2      5      1       0  0.000000       0  0.000000       1  6.598062
 6: 2019-04-20 10:00:00 2019-04-21 12:00:00     3      6      0       0  0.000000       0  0.000000       0  0.000000
 7: 2019-04-22 13:30:00 2019-04-22 17:30:00     3      7      3       2  7.610410       0  0.000000       1 10.218593
 8: 2019-04-10 15:30:00 2019-04-10 20:00:00     4      8      2       0  0.000000       2  9.703278       0  0.000000
 9: 2019-04-12 21:30:00 2019-04-13 05:00:00     4      9      1       1  7.095564       0  0.000000       0  0.000000
10: 2019-04-15 20:00:00 2019-04-15 12:30:00     5     10      0       0  0.000000       0  0.000000       0  0.000000
Run Code Online (Sandbox Code Playgroud)

但这对于我的原始数据来说太慢了,原始数据有1300万行consEx和200万警报。还有几种不同的类型,我需要计算两个指标的平均值,最小值和最大值。

有没有更快的方法来编写函数或循环?

另外,用python编写几乎相同的代码是否会带来更好的结果?我正在考虑重写它,但这需要很长时间,而且我不确定它是否可以解决问题

谢谢

PS:我尝试过的另一种方法是遍历警报并为每个警报分配一个tripIdfrom consEx,然后汇总结果表,但这似乎更慢。

Uwe*_*Uwe 7

该答案提出了两种data.table方法,它们都使用非等值联接将警报匹配到行程并dcast()进行重整,但是在聚合方法上的区别在于consEx

第一个变量创建一个新的result数据集,constEx保持不变。

第二个变体constEx在适当位置进行修改。这类似于Alexis的解决方案,但更加简洁并使用了新setnafill()功能(开发版本为1.12.3的版本提供data.table

形式1:创建新的结果数据集

agg <- consEx[alertsEx, on = .(carId, begin <= timestamp, end >= timestamp), 
              .(tripId, type, additionalInfo1)][
                , .(Count = .N, Mean = mean(additionalInfo1)), 
                by = .(tripId, type)][
                  , totals := sum(Count), by = tripId][] 
result <- dcast(agg[consEx, on = "tripId"], ... ~ type, 
                value.var = c("Count", "Mean"), fill = 0)[
                  , c("Count_NA", "Mean_NA") := NULL][
                    is.na(totals), totals := 0]
setcolorder(result, names(consEx))
result
Run Code Online (Sandbox Code Playgroud)
                  begin                 end carId tripId totals Count_T1 Count_T2 Count_T3   Mean_T1   Mean_T2   Mean_T3
 1: 2019-04-01 00:00:10 2019-04-01 20:00:00     1      1      3        2        1        0 12.654609 12.375862  0.000000
 2: 2019-04-07 10:00:00 2019-04-07 15:00:00     1      2      0        0        0        0  0.000000  0.000000  0.000000
 3: 2019-04-10 23:00:00 2019-04-11 10:00:00     1      3      0        0        0        0  0.000000  0.000000  0.000000
 4: 2019-04-12 20:00:00 2019-04-12 23:30:00     2      4      0        0        0        0  0.000000  0.000000  0.000000
 5: 2019-04-15 10:00:00 2019-04-15 15:00:00     2      5      1        0        0        1  0.000000  0.000000  9.316815
 6: 2019-04-20 10:00:00 2019-04-21 12:00:00     3      6      0        0        0        0  0.000000  0.000000  0.000000
 7: 2019-04-22 13:30:00 2019-04-22 17:30:00     3      7      3        2        0        1  8.586061  0.000000 11.498512
 8: 2019-04-10 15:30:00 2019-04-10 20:00:00     4      8      2        0        2        0  0.000000  8.696356  0.000000
 9: 2019-04-12 21:30:00 2019-04-13 05:00:00     4      9      1        1        0        0  9.343681  0.000000  0.000000
10: 2019-04-15 20:00:00 2019-04-15 12:30:00     5     10      0        0        0        0  0.000000  0.000000  0.000000
Run Code Online (Sandbox Code Playgroud)

请注意,发布的不同答案之间的方式可能会有所不同additionalInfo1rnorm(10, mean = 10, sd = 4)而无需set.seed()事先致电即可创建 。调用set.seed()会创建可复制的随机数。

变体1的说明

第一部分通过等额联接为每个警报找到匹配的行程,并通过tripId和计算总计,type以及每个行程的总数:

agg
Run Code Online (Sandbox Code Playgroud)
   tripId type Count      Mean totals
1:      1   T1     2 12.654609      3
2:      1   T2     1 12.375862      3
3:      5   T3     1  9.316815      1
4:      7   T1     2  8.586061      3
5:      7   T3     1 11.498512      3
6:      8   T2     2  8.696356      2
7:      9   T1     1  9.343681      1
Run Code Online (Sandbox Code Playgroud)

请注意,tripId假定它是中的唯一键consEx

在下一步中,将聚合右连接到 consEx

agg[consEx, on = "tripId"]
Run Code Online (Sandbox Code Playgroud)
    tripId type Count      Mean totals               begin                 end carId
 1:      1   T1     2 12.654609      3 2019-04-01 00:00:10 2019-04-01 20:00:00     1
 2:      1   T2     1 12.375862      3 2019-04-01 00:00:10 2019-04-01 20:00:00     1
 3:      2 <NA>    NA        NA     NA 2019-04-07 10:00:00 2019-04-07 15:00:00     1
 4:      3 <NA>    NA        NA     NA 2019-04-10 23:00:00 2019-04-11 10:00:00     1
 5:      4 <NA>    NA        NA     NA 2019-04-12 20:00:00 2019-04-12 23:30:00     2
 6:      5   T3     1  9.316815      1 2019-04-15 10:00:00 2019-04-15 15:00:00     2
 7:      6 <NA>    NA        NA     NA 2019-04-20 10:00:00 2019-04-21 12:00:00     3
 8:      7   T1     2  8.586061      3 2019-04-22 13:30:00 2019-04-22 17:30:00     3
 9:      7   T3     1 11.498512      3 2019-04-22 13:30:00 2019-04-22 17:30:00     3
10:      8   T2     2  8.696356      2 2019-04-10 15:30:00 2019-04-10 20:00:00     4
11:      9   T1     1  9.343681      1 2019-04-12 21:30:00 2019-04-13 05:00:00     4
12:     10 <NA>    NA        NA     NA 2019-04-15 20:00:00 2019-04-15 12:30:00     5
Run Code Online (Sandbox Code Playgroud)

使用可以立即将输出从长格式更改为宽格式dcast()

最后,通过删除多余的列,将其替换NA为0并更改列顺序来清理结果。

形式2:更新到位

agg <- consEx[alertsEx, on = .(carId, begin <= timestamp, end >= timestamp), 
              .(tripId, type, additionalInfo1)][
                , .(Count = .N, Mean = mean(additionalInfo1)), 
                by = .(tripId, type)][
                  , totals := sum(Count), by = tripId][] 
wide <- dcast(agg, ... ~ type, value.var = c("Count", "Mean"), fill = 0)
consEx[wide, on = "tripId", (names(wide)) := mget(paste0("i.", names(wide)))]
setnafill(consEx, fill = 0) # data.table version 1.12.3+ 

consEx
Run Code Online (Sandbox Code Playgroud)
                  begin                 end carId tripId totals Count_T1 Count_T2 Count_T3   Mean_T1   Mean_T2   Mean_T3
 1: 2019-04-01 00:00:10 2019-04-01 20:00:00     1      1      3        2        1        0 12.654609 12.375862  0.000000
 2: 2019-04-07 10:00:00 2019-04-07 15:00:00     1      2      0        0        0        0  0.000000  0.000000  0.000000
 3: 2019-04-10 23:00:00 2019-04-11 10:00:00     1      3      0        0        0        0  0.000000  0.000000  0.000000
 4: 2019-04-12 20:00:00 2019-04-12 23:30:00     2      4      0        0        0        0  0.000000  0.000000  0.000000
 5: 2019-04-15 10:00:00 2019-04-15 15:00:00     2      5      1        0        0        1  0.000000  0.000000  9.316815
 6: 2019-04-20 10:00:00 2019-04-21 12:00:00     3      6      0        0        0        0  0.000000  0.000000  0.000000
 7: 2019-04-22 13:30:00 2019-04-22 17:30:00     3      7      3        2        0        1  8.586061  0.000000 11.498512
 8: 2019-04-10 15:30:00 2019-04-10 20:00:00     4      8      2        0        2        0  0.000000  8.696356  0.000000
 9: 2019-04-12 21:30:00 2019-04-13 05:00:00     4      9      1        1        0        0  9.343681  0.000000  0.000000
10: 2019-04-15 20:00:00 2019-04-15 12:30:00     5     10      0        0        0        0  0.000000  0.000000  0.000000
Run Code Online (Sandbox Code Playgroud)

变型2的说明

第一部分与变体1中的相同。

在下一步中,将聚集体从长格式更改为宽格式。

然后,执行更新联接,其中通过的匹配行在适当位置进行consEx更新。表达方式wide

(names(wide)) := mget(paste0("i.", names(wide))) 
Run Code Online (Sandbox Code Playgroud)

是捷径

`:=`(totals = i.totals, Count_T1 = i.Count_T1, ..., Mean_T3 = i.Mean_T3)
Run Code Online (Sandbox Code Playgroud)

其中i.前缀是的列,wide用于避免歧义。

consEx没有匹配项的行中,附加列包含NA。因此 ,最后setnafill()用于将所有NA值替换为0。

基准测试

在撰写本文时,由以下人员发布了6种不同的解决方案

5种R溶液使用比较press()mark()bench包。

样本数据集被参数化以改变行数。正如consEx某些解决方案所修改的那样,每次基准测试运行都从新副本开始。

要使代码与一起使用,需要进行一些小的更改bench。最显着的是,... ~ typein中的快捷方式公式在dcast()内部调用时会导致错误bench::mark(),必须替换为明确命名LHS所有变量的版本。

正如OP指出的那样,他的版本相当慢,第一次基准测试仅使用1000行consEx和50、200和1000行alertsEx

bm <- bench::press(
  n_trips = 1E3,
  n_alerts = 1E3/c(20, 5, 1),
  {
    n_cars <- 5
    n_periods <- round(n_trips / n_cars)
    n_types = 3
    types = sprintf("T%02i", 1:n_types)
    consEx0 <- data.table(
      tripId = 1:n_trips, 
      carId = rep(1:n_cars, each = n_periods),
      begin = seq.POSIXt(as.POSIXct("2000-01-01"), length.out = n_periods, by = "14 days") %>% 
        rep(n_cars) %>% 
        `mode<-`("double")
    )[, end := begin + 7*24*60*60]
    set.seed(1)
    idx <- sample(n_trips, n_alerts, TRUE)
    alertsEx <- consEx0[
      idx, 
      .(timestamp = begin + runif(n_alerts, 1, 7*24*60*60 - 1),
        type = sample(types, n_alerts, TRUE), 
        carId,
        additionalInfo1 = rnorm(n_alerts, mean = 10, sd = 4))]

    bench::mark(
      fino = {
        consEx <- copy(consEx0)
        findAlerts = function(begin,end,carId_2){
          saida = alertsEx[timestamp >= begin & timestamp <= end & carId == carId_2,]
          totals = nrow(saida)
          saida = split(saida,by="type")
          resultsList = list(
            "totals" = 0,
            "t1Count" = 0,
            "t1Mean" = 0,
            "t2Count" = 0,
            "t2Mean" = 0,
            "t3Count" = 0,
            "t3Mean" = 0)
          resultsList[["totals"]] = totals
          types = names(saida)
          if("T1" %in% types){
            resultsList[["t1Count"]] = nrow(saida[["T1"]])
            resultsList[["t1Mean"]] = mean(saida[["T1"]]$additionalInfo1)
          }
          if("T2" %in% types){
            resultsList[["t2Count"]] = nrow(saida[["T2"]])
            resultsList[["t2Mean"]] = mean(saida[["T2"]]$additionalInfo1)
          }
          if("T3" %in% types){
            resultsList[["t3Count"]] = nrow(saida[["T3"]])
            resultsList[["t3Mean"]] = mean(saida[["T3"]]$additionalInfo1)
          }
          return(resultsList)
        }
        for(i in 1:nrow(consEx)){
          aux = findAlerts(consEx$begin[i],consEx$end[i],consEx$carId[i])
          consEx[i,names(aux) := aux]
        }
        consEx[]
      },
      dave = {
        # library(dplyr)
        # library(tidyr)

        consEx <- copy(consEx0)
        #split the consEx down to separate carId
        splitcons <- split(consEx, consEx$carId)

        alertsEx$tripId <- NA
        #loop and only search the cons that match the alerts
        alertsEx$tripId <- apply(alertsEx, 1, function(a) {
          #retrive the list assicoated to the correct car
          cons <- splitcons[[a[["carId"]]]]
          alerttime <- as.POSIXct(a[["timestamp"]])
          #find the trip which contains the alerttime
          tripId <-
            which((alerttime >= cons$begin) & (alerttime <= cons$end))
          #identify and return the tripId
          cons$tripId[tripId]
        })

        #Referenced:
        #/sf/ask/2141446611/
        #/sf/ask/2472504821/
        alertsEx2 <-
          alertsEx %>% 
          dplyr::group_by(carId, tripId, type) %>% 
          dplyr::summarize(mean = mean(additionalInfo1), count = dplyr::n())
        alerttable <-
          tidyr::gather(alertsEx2, variable, val, -(carId:type), na.rm = TRUE) %>%
          tidyr::unite(temp, type, variable) %>%
          tidyr::spread(temp, val, fill = 0)
        consEx %>% 
          dplyr::left_join(alerttable, by = c("tripId", "carId")) 

      },
      alexis = {
        consEx <- copy(consEx0)
        joined <- consEx[alertsEx,
                         .(carId, tripId, type, additionalInfo1),
                         on = .(carId, begin <= timestamp, end >= timestamp)]
        aggregated <- joined[, .(typeCount = .N, typeMean = mean(additionalInfo1)), by = .(carId, tripId, type)]
        totals <- aggregated[, .(totals = sum(typeCount)), by = .(carId, tripId)]
        long <- dcast(aggregated, carId + tripId ~ type, value.var = c("typeCount", "typeMean"), sep = "", fill = 0)
        replaceNA <- function(x) { replace(x, is.na(x), 0) }
        consEx[, `:=`(as.character(outer(types, c("Count", "Mean"), paste0)),
                      lapply(long[consEx,
                                  as.character(outer(types, c("typeCount", "typeMean"),
                                                     function(a, b) { paste0(b, a) })),
                                  with = FALSE,
                                  on = .(carId, tripId)],
                             replaceNA))]
        consEx[, totals := lapply(totals[consEx, x.totals, on = .(carId, tripId)], replaceNA)]
        setcolorder(consEx, c("carId", "tripId", "begin", "end"))
        consEx
      },
      uwe1 = {
        consEx <- copy(consEx0)
        agg <- consEx[alertsEx, on = .(carId, begin <= timestamp, end >= timestamp), 
                      .(tripId, type, additionalInfo1)][
                        , .(Count = .N, Mean = mean(additionalInfo1)), 
                        by = .(tripId, type)][
                          , totals := sum(Count), by = tripId][] 
        result <- dcast(agg[consEx, on = "tripId"], 
                        tripId + carId + begin+ end + totals ~ type, 
                        value.var = c("Count", "Mean"), fill = 0)[
                          , c("Count_NA", "Mean_NA") := NULL][
                            is.na(totals), totals := 0][]
        # setcolorder(result, names(consEx))
        result
      },
      uwe2 = {
        consEx <- copy(consEx0)
        agg <- consEx[alertsEx, on = .(carId, begin <= timestamp, end >= timestamp), 
                      .(tripId, type, additionalInfo1)][
                        , .(Count = .N, Mean = mean(additionalInfo1)), 
                        by = .(tripId, type)][
                          , totals := sum(Count), by = tripId][] 
        wide <- dcast(agg, tripId + totals ~ type, value.var = c("Count", "Mean"), fill = 0)
        consEx[wide, on = "tripId", (names(wide)) := mget(paste0("i.", names(wide)))]
        setnafill(consEx, fill = 0) # data.table version 1.12.3+ 
        consEx
      },
      check = FALSE
    )
  }
)

library(ggplot2)
autoplot(bm)
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明

请注意对数时标。

该图表证实了OP的假设,即他的方法比其他任何解决方案都慢得多。1000行花了3秒,而OP的生产数据集有1200万行。Dave2e方法的运行时间随着警报数量的增加而增加。

另外,OP的方法对分配的内存要求最高。它最多可分配190 MB,而data.table版本只需要1 MByt。

bm
Run Code Online (Sandbox Code Playgroud)
# A tibble: 15 x 15
   expression n_trips n_alerts      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time 
   <bch:expr>   <dbl>    <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list> <lis>
 1 fino          1000       50    2.95s    2.95s     0.339  149.25MB     3.39     1    10      2.95s <data~ <Rpro~ <bch~
 2 dave          1000       50  21.52ms  22.61ms    41.6    779.46KB     5.94    21     3   505.01ms <data~ <Rpro~ <bch~
 3 alexis        1000       50  21.98ms  23.09ms    41.2   1007.84KB     3.93    21     2    509.5ms <data~ <Rpro~ <bch~
 4 uwe1          1000       50  14.47ms  15.45ms    62.3    919.12KB     1.95    32     1   513.94ms <data~ <Rpro~ <bch~
 5 uwe2          1000       50  14.14ms  14.73ms    64.2     633.2KB     1.95    33     1   513.91ms <data~ <Rpro~ <bch~
 6 fino          1000      200    3.09s    3.09s     0.323  155.12MB     4.53     1    14      3.09s <data~ <Rpro~ <bch~
 7 dave          1000      200   53.4ms  62.11ms    11.0       1.7MB     5.49     8     4   728.02ms <data~ <Rpro~ <bch~
 8 alexis        1000      200  22.42ms  23.34ms    41.5      1.03MB     3.77    22     2   530.25ms <data~ <Rpro~ <bch~
 9 uwe1          1000      200  14.72ms  15.76ms    62.2    953.07KB     0       32     0   514.46ms <data~ <Rpro~ <bch~
10 uwe2          1000      200  14.49ms  15.17ms    63.6    695.63KB     1.99    32     1    503.4ms <data~ <Rpro~ <bch~
11 fino          1000     1000     3.6s     3.6s     0.278  187.32MB     3.61     1    13       3.6s <data~ <Rpro~ <bch~
12 dave          1000     1000 242.46ms  243.7ms     4.07      6.5MB     6.78     3     5   737.06ms <data~ <Rpro~ <bch~
13 alexis        1000     1000  24.32ms  25.78ms    37.6      1.21MB     3.95    19     2   505.84ms <data~ <Rpro~ <bch~
14 uwe1          1000     1000  16.04ms   16.8ms    56.8      1.05MB     1.96    29     1   510.23ms <data~ <Rpro~ <bch~
15 uwe2          1000     1000  15.69ms  16.41ms    54.8    938.74KB     3.92    28     2   510.63ms <data~ <Rpro~ <bch~
Run Code Online (Sandbox Code Playgroud)

因此,在第二个基准测试中省略了Fino的方法,因为问题大小较大,分别为10k和100k行,consExalertsExresp分别为2k和20k行。

在此处输入图片说明

由于Dave2e的方法在20k行中的速度比最快的方法慢100倍,因此在第三次运行中被忽略了。这将模拟OP的12M行consEx和2M行的生产数据集,其中alertsEx

在此处输入图片说明

print(bm, n = Inf)
Run Code Online (Sandbox Code Playgroud)
# A tibble: 27 x 15
   expression n_trips n_alerts      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory time 
   <bch:expr>   <dbl>    <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list> <lis>
 1 alexis      1.00e5    20000 747.87ms 747.87ms  1.34       38.18MB  6.69        1     5   747.87ms <data~ <Rpro~ <bch~
 2 uwe1        1.00e5    20000 124.52ms 138.89ms  7.32       37.88MB  3.66        4     2   546.17ms <data~ <Rpro~ <bch~
 3 uwe2        1.00e5    20000  72.29ms  73.76ms 11.3        16.63MB  1.88        6     1   533.26ms <data~ <Rpro~ <bch~
 4 alexis      1.00e6    20000    7.51s    7.51s  0.133     335.44MB  3.33        1    25      7.51s <data~ <Rpro~ <bch~
 5 uwe1        1.00e6    20000 995.31ms 995.31ms  1.00      346.94MB  2.01        1     2   995.31ms <data~ <Rpro~ <bch~
 6 uwe2        1.00e6    20000 714.19ms 714.19ms  1.40       89.27MB  1.40        1     1   714.19ms <data~ <Rpro~ <bch~
 7 alexis      1.00e7    20000    1.67m    1.67m  0.00999     3.21GB  0.340       1    34      1.67m <data~ <Rpro~ <bch~
 8 uwe1        1.00e7    20000    3.44m    3.44m  0.00485     3.41GB  0.00969     1     2      3.44m <data~ <Rpro~ <bch~
 9 uwe2        1.00e7    20000   42.51s   42.51s  0.0235     810.3MB  0           1     0     42.51s <data~ <Rpro~ <bch~
10 alexis      1.00e5   200000   15.38s   15.38s  0.0650     73.22MB  0.0650      1     1     15.38s <data~ <Rpro~ <bch~
11 uwe1        1.00e5   200000    1.34s    1.34s  0.747      63.81MB  0           1     0      1.34s <data~ <Rpro~ <bch~
12 uwe2        1.00e5   200000    1.47s    1.47s  0.681      58.98MB  0           1     0      1.47s <data~ <Rpro~ <bch~
13 alexis      1.00e6   200000    2.91m    2.91m  0.00573   375.93MB  0.0115      1     2      2.91m <data~ <Rpro~ <bch~
14 uwe1        1.00e6   200000    9.72s    9.72s  0.103     371.69MB  0           1     0      9.72s <data~ <Rpro~ <bch~
15 uwe2        1.00e6   200000 888.67ms 888.67ms  1.13      161.82MB  0           1     0   888.67ms <data~ <Rpro~ <bch~
16 alexis      1.00e7   200000    6.29m    6.29m  0.00265     3.15GB  0.0928      1    35      6.29m <data~ <Rpro~ <bch~
17 uw