我想按月分组并汇总所有变量(每个月的数字和分类变量)。特别是,生成的小标题中的分类变量应显示每个月最常见的级别及其频率(以百分比表示)。假设下面有一个示例数据集:
date <- as.Date(c("2021-03-13",
"2021-03-12",
"2021-04-14",
"2021-04-17",
"2021-04-17",
"2021-05-17", "2021-05-17", "2021-06-17", "2021-07-17", "2021-07-17"))
Partograph_use <- as.factor(c("Partograph", "Partograph","Partograph",
"Partograph","Partograph", "Partograph","labor care guide" ,
"labor care guide", "labor care guide" , "labor care guide"))
duration_labor <- as.numeric(c(12, 5, 6, 5, 5, 6, 7, 10, 10, 5))
augument_ox <- as.factor(c("Yes", "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No"))
urgent_csection <- as.factor(c("Yes", "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")) Yes
Run Code Online (Sandbox Code Playgroud)
然后是示例数据框:
test.df <- cbind.data.frame(date, Partograph_use, duration_labor, augument_ox, urgent_csection)
Run Code Online (Sandbox Code Playgroud)
我尝试使用lubridate函数和dplyr包按月对日期进行分组,但仅使用以下代码对数字变量“duration_of_labour_hours”成功:
lcg.df %>% group_by(month = lubridate::floor_date(date_of_admission,'month')) %>%
summarize(median_dur_labor = median(duration_of_labour_hours, na.rm = TRUE),
quat_5th = quantile(duration_of_labour_hours, probs = 0.05, na.rm = TRUE),
quat_95th = quantile(duration_of_labour_hours, probs = 0.95, na.rm = TRUE))
Run Code Online (Sandbox Code Playgroud)
不幸的是,我无法弄清楚如何总结分类变量,以便小标题显示每个月行的最高频率及其各自的频率(以百分比表示)的级别。
尝试这个。
\nquux %>%\n mutate(ym = lubridate::floor_date(date_of_admission, unit="months")) %>%\n group_by(ym) %>%\n reframe(\n across(where(is.numeric), list(mean = ~ mean(.), median = ~ median(.))),\n across(where(is.character), ~ { tb <- table(.); 100 * sort(tb)[1]/sum(tb); }, .names = "{.col}_pct"),\n across(where(is.character), ~ names(sort(table(.)))[1], .names = "{.col}_mode")\n )\n# # A tibble: 5 \xc3\x97 9\n# ym duration_of_labour_hours_mean duration_of_labour_hours_median labour_monitoring_pct artificial_rupture_of_memb_pct augmentation_with_oxytocin_pct labour_monitoring_mode artificial_rupture_of_memb_mode augmentation_with_ox\xe2\x80\xa6\xc2\xb9\n# <date> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> \n# 1 2021-07-01 7.24 7 100 20.4 32.7 Partograph No Yes \n# 2 2021-08-01 7.19 7 100 19.0 42.9 Partograph No No \n# 3 2021-09-01 8.19 8 100 9.52 38.1 Partograph No No \n# 4 2021-10-01 6.12 5.5 100 25 37.5 Partograph No Yes \n# 5 2023-08-01 18 18 100 100 100 Partograph Yes Yes \n# # \xe2\x84\xb9 abbreviated name: \xc2\xb9\xe2\x80\x8baugmentation_with_oxytocin_mode\nRun Code Online (Sandbox Code Playgroud)\n有点不幸的是,我计算了table(.)两次字符列;如果您的数据较大,那么提出一个执行一次并返回(例如)列表列或类似内容的函数并不太困难,然后您可以使用另一个across(.)“访问器”来提取这两个组件。仅当您的数据相当大且双倍table成本太高时才需要这样做。
字符列上的计算需要放在最后,否则它们的“百分比”列将在数字摘要中被发现并进行双重汇总(只是不必要的)。
\n使用 的命名函数across在这里真正有帮助:list(mu = ~ mean(.))是一个简单的例子,您可以使用任何您想要的名称(LHS of =),而RHS可以是一个~函数,如我在这里所示,它可以是一个更正式的函数function(z) {...},也可以是一个命名函数(例如,list(mu = mean)也可以,但有时由于我不知道的原因而不鼓励)。这允许您任意添加其他内容(例如,list(..., q90 = ~ quantile(., 0.9)))。
我使用过where(is.character),但如果您factor也有 s,那么您可能也希望考虑它们。为此,您可以使用where(~ is.character(.) | is.factor(.))。
数据
\nquux <- structure(list(date_of_admission = structure(c(18821, 18820, 18822, 18824, 18825, 18824, 18825, 18824, 18826, 18825, 18824, 18826, 18825, 18820, 18821, 18821, 18825, 18821, 18821, 18812, 18820, 18821, 18811, 18809, 18809, 18817, 18818, 18824, 18823, 18813, 18813, 18812, 18812, 18815, 18815, 18814, 18810, 18810, 18811, 18832, 18829, 18823, 19591, 18902, 18923, 18830, 18869, 18811, 18810, 18861, 18922, 18867, 18884, 18885, 18852, 18852, 18879, 18816, 18848, 18850, 18879, 18857, 18877, 18878, 18816, 18822, 18829, 18864, 18854, 18864, 18854, 18879, 18879, 18863, 18871, 18855, 18878, 18878, 18874, 18850, 18855, 18871, 18878, 18881, 18874, 18874, 18874, 18882, 18873, 18930, 18930, 18930, 18842, 18862, 18929, 18873, 18929, 18842, 18842, 18842), class = "Date"), labour_monitoring = c("Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph", "Partograph"), duration_of_labour_hours = c(12L, 5L, 5L, 8L, 6L, 7L, 10L, 10L, 5L, 7L, 7L, 8L, 8L, 6L, 6L, 7L, 8L, 9L, 8L, 11L, 12L, 7L, 10L, 6L, 5L, 12L, 9L, 10L, 6L, 5L, 6L, 6L, 5L, 5L, 3L, 3L, 7L, 8L, 3L, 5L, 11L, 6L, 18L, 6L, 4L, 4L, 8L, 11L, 5L, 10L, 8L, 7L, 8L, 10L, 7L, 8L, 8L, 10L, 7L, 9L, 4L, 5L, 10L, 5L, 8L, 8L, 6L, 8L, 8L, 13L, 6L, 4L, 14L, 5L, 4L, 6L, 12L, 7L, 15L, 4L, 7L, 9L, 6L, 6L, 7L, 12L, 5L, 10L, 11L, 4L, 3L, 5L, 9L, 6L, 7L, 5L, 12L, 5L, 6L, 7L), artificial_rupture_of_memb = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes"), augmentation_with_oxytocin = c("No", "No", "No", "No", "No", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "Yes", "No", "No", "Yes", "No", "No", "No", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes" )), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92", "93", "94", "95", "96", "97", "98", "99", "100"), class = "data.frame")\nRun Code Online (Sandbox Code Playgroud)\n