Car*_*tal 2 datatable r data-manipulation dplyr
我有这个数据集
\n\n\ndf <- tibble(id, event, duration)\nRun Code Online (Sandbox Code Playgroud)\n我需要使用后续“表面”计算每个“潜水”行表面的持续时间比例,并将结果插入到新列中。所有这些都由“id”分隔。
\n比例=水面/潜水+水面
\n#Output dataframe\n\n# A tibble: 8 x 4\n id event duration proportion \n1 A surface 56 x \n2 A surface 96 x \n3 A surface 14 x \n4 A surface 77 x \n5 B surface 28 x \n6 B surface 63 x \n7 B surface 47 x \n8 B surface 90 x \n\n############################################################\nRun Code Online (Sandbox Code Playgroud)\n编辑:
\n在我的原始数据中,我有一些没有“表面”的“潜水”,并且创建的此代码有错误。
\nError in `dplyr::mutate()`:\n! Problem while computing `proportion = DurationMin[What ==\n "Surface"]/sum(DurationMin)`.\n\xe2\x9c\x96 `proportion` must be size 2 or 1, not 0.\n\xe2\x84\xb9 The error occurred in group 2803: ptt = "2017111870", grp = 1015.\nRun Code Online (Sandbox Code Playgroud)\n在“id”内将有奇数行,其中“潜水”事件在其序列中不会有“表面”。所以我需要每次遇到未配对的事件时,要么将其忽略,要么插入 NA。这是可能的?
\n按照这个数据框示例:
\n\nid <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")\n\nevent <- c("dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive", "surface", "dive")\n\nduration <- c(55, 56, 40, 96, 58, 14, 43, 77, 19, 28, 34, 63, 29, 47, 61)\n\ndf <- tibble(id, event, duration)\n\n> df\n id event duration\n1 A dive 55\n2 A surface 56\n3 A dive 40\n4 A surface 96\n5 A dive 58\n6 A surface 14\n7 A dive 43\n8 A surface 77\n9 B dive 19\n10 B surface 28\n11 B dive 34\n12 B surface 63\n13 B dive 29\n14 B surface 47\n15 B dive 61\n16 B dive 45\n17 B surface 30\n> \nRun Code Online (Sandbox Code Playgroud)\n
我们可以使用gl每 2 行创建分组索引,然后通过将事件值为 'surface' ( event == 'surface') 的 'duration' 除以sum'duration' 来创建列 'proportion'
library(dplyr)\ndf %>%\n group_by(id) %>%\n group_by(grp = as.integer(gl(n(), 2, n())), .add = TRUE) %>% \n mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>%\n ungroup %>%\n select(-grp)\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n# A tibble: 16 \xc3\x97 4\n id event duration proportion\n <chr> <chr> <dbl> <dbl>\n 1 A dive 55 0.505\n 2 A surface 56 0.505\n 3 A dive 40 0.706\n 4 A surface 96 0.706\n 5 A dive 58 0.194\n 6 A surface 14 0.194\n 7 A dive 43 0.642\n 8 A surface 77 0.642\n 9 B dive 19 0.596\n10 B surface 28 0.596\n11 B dive 34 0.649\n12 B surface 63 0.649\n13 B dive 29 0.618\n14 B surface 47 0.618\n15 B dive 61 0.596\n16 B surface 90 0.596\nRun Code Online (Sandbox Code Playgroud)\n对于新的数据集,我们可以使用
\ndf %>% \n group_by(id) %>% \n group_by(grp = cumsum(event == 'dive'), .add = TRUE) %>% \n mutate(proportion = duration[event == 'surface'][1]/sum(duration)) %>% \n ungroup %>%\n select(-grp)\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n# A tibble: 17 \xc3\x97 4\n id event duration proportion\n <chr> <chr> <int> <dbl>\n 1 A dive 55 0.505\n 2 A surface 56 0.505\n 3 A dive 40 0.706\n 4 A surface 96 0.706\n 5 A dive 58 0.194\n 6 A surface 14 0.194\n 7 A dive 43 0.642\n 8 A surface 77 0.642\n 9 B dive 19 0.596\n10 B surface 28 0.596\n11 B dive 34 0.649\n12 B surface 63 0.649\n13 B dive 29 0.618\n14 B surface 47 0.618\n15 B dive 61 NA \n16 B dive 45 0.4 \n17 B surface 30 0.4 \nRun Code Online (Sandbox Code Playgroud)\ndf <- structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A", \n"B", "B", "B", "B", "B", "B", "B", "B", "B"), event = c("dive", \n"surface", "dive", "surface", "dive", "surface", "dive", "surface", \n"dive", "surface", "dive", "surface", "dive", "surface", "dive", \n"dive", "surface"), duration = c(55L, 56L, 40L, 96L, 58L, 14L, \n43L, 77L, 19L, 28L, 34L, 63L, 29L, 47L, 61L, 45L, 30L)), \nclass = "data.frame", row.names = c("1", \n"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", \n"14", "15", "16", "17"))\nRun Code Online (Sandbox Code Playgroud)\n