如何折叠频率表的行以在新列中添加其计数?

gae*_*cia 7 r frequency dataframe dplyr janitor

我有一个包含样本分类的数据框:

 Seq_ID   Family Father   Mother   Sex    Role    Type  
   <chr>     <dbl> <chr>    <chr>    <chr>  <chr>   <chr> 
 1 SSC02219 11000. 0        0        Male   Father  Parent
 2 SSC02217 11000. 0        0        Female Mother  Parent
 3 SSC02254 11000. SSC02219 SSC02217 Male   Proband Child 
 4 SSC02220 11000. SSC02219 SSC02217 Female Sibling Child 
 5 SSC02184 11001. 0        0        Male   Father  Parent
 6 SSC02181 11001. 0        0        Female Mother  Parent
 7 SSC02178 11001. SSC02184 SSC02181 Male   Proband Child 
 8 SSC03092 11002. 0        0        Male   Father  Parent
 9 SSC03078 11002. 0        0        Female Mother  Parent
10 SSC03070 11002. SSC03092 SSC03078 Female Proband Child 
Run Code Online (Sandbox Code Playgroud)

目前,从a到b,我必须这样做:

library(tidyverse)
library(janitor)

sample.df %>% tabyl(Role, Sex) %>% 
  adorn_totals(where=c("row", "col") ) %>% 
  as.tibble() %>% select(1,4,3,2) %>%
  # Part 2
  mutate(type=c("parent", "parent", "child", "child", " ")) %>% 
  inner_join(., group_by(., type) %>% 
  summarise(total=sum(Total))) %>% 
  select(5,6,1,2,3,4)
Run Code Online (Sandbox Code Playgroud)

我觉得这是一个非常简单的解决方法.在dplyr中有更直接的方法来完成第二部分吗?

一个 在此输入图像描述

b 在此输入图像描述

www*_*www 2

这是一个选项。as.tibble没有必要。当您有很多课程要分配给“父级”或“子级”时,mutatewith更易于管理。不是必需的,因为我们可以使用和来计算. 最后,我喜欢在使用该函数时写下列名,因为这样将来更容易阅读,但是您当然可以使用列索引,只要您确信列索引无论如何都不会改变您可以在管道操作中包含哪些新分析。case_wheninner_joingroup_bymutatetotalselect

library(tidyverse)
library(janitor)

sample.df %>% 
  tabyl(Role, Sex) %>% 
  adorn_totals(where=c("row", "col")) %>% 
  select(Role, Total, Male, Female) %>%
  # Part 2
  mutate(type = case_when(
    Role %in% c("Mother", "Father")      ~"parent",
    Role %in% c("Proband", "Sibling")    ~"child",
    TRUE                                 ~" "
  )) %>% 
  group_by(type) %>% 
  mutate(total = sum(Total)) %>%
  ungroup() %>%
  select(type, total, Role, Total, Male, Female)
# # A tibble: 5 x 6
#   type   total Role    Total  Male Female
#   <chr>  <dbl> <chr>   <dbl> <dbl>  <dbl>
# 1 parent    6. Father     3.    3.     0.
# 2 parent    6. Mother     3.    0.     3.
# 3 child     4. Proband    3.    2.     1.
# 4 child     4. Sibling    1.    0.     1.
# 5 " "      10. Total     10.    5.     5.
Run Code Online (Sandbox Code Playgroud)

数据

library(tidyverse)
library(janitor)

sample.df <- read.table(text = "Seq_ID   Family Father   Mother   Sex    Role    Type  
 1 SSC02219 11000  0        0        Male   Father  Parent
 2 SSC02217 11000  0        0        Female Mother  Parent
 3 SSC02254 11000  SSC02219 SSC02217 Male   Proband Child 
 4 SSC02220 11000  SSC02219 SSC02217 Female Sibling Child 
 5 SSC02184 11001  0        0        Male   Father  Parent
 6 SSC02181 11001  0        0        Female Mother  Parent
 7 SSC02178 11001  SSC02184 SSC02181 Male   Proband Child 
 8 SSC03092 11002  0        0        Male   Father  Parent
 9 SSC03078 11002  0        0        Female Mother  Parent
10 SSC03070 11002  SSC03092 SSC03078 Female Proband Child ",
                        header = TRUE, stringsAsFactors = FALSE)

sample.df <- as_tibble(sample.df)
Run Code Online (Sandbox Code Playgroud)