按某些列汇总单元格

Stu*_*000 3 r frequency

我有一张桌子看起来像这样:

 df <- read.table(text = 
      "  Day            location     gender    hashtags
       'Feb 19 2016'       'UK'      'M'       '#a'
       'Feb 19 2016'       'UK'      'M'       '#b'
       'Feb 19 2016'       'SP'      'F'       '#a'
       'Feb 19 2016'       'SP'      'F'       '#b'
       'Feb 19 2016'       'SP'      'M'       '#a'
       'Feb 19 2016'       'SP'      'M'       '#b'
       'Feb 20 2016'       'UK'      'F'       '#a'", 
                 header = TRUE, stringsAsFactors = FALSE) 
Run Code Online (Sandbox Code Playgroud)

我想按天/标签/位置和性别计算频率,结果表如下所示:

           Day hashtags Daily_Freq men women Freq_UK Freq_SP
   Feb 19 2016       #a          3   2     1       1       2
   Feb 19 2016       #b          3   2     1       1       1
   Feb 20 2016       #a          1   0     1       1       0
Run Code Online (Sandbox Code Playgroud)

其中Daily_freq =男性+女性= Freq_UK + Freq_SP我该怎么办?

Jaa*_*aap 6

使用dplyr:

library(dplyr)
df %>% 
  group_by(Day, hashtags) %>% 
  summarise(Daily_Freq = n(),
            men = sum(gender == 'M'),
            women = sum(gender == 'F'),
            Freq_UK = sum(location == 'UK'),
            Freq_SP = sum(location == 'SP'))
Run Code Online (Sandbox Code Playgroud)

得到:

# A tibble: 3 x 7
# Groups:   Day [?]
  Day         hashtags Daily_Freq   men women Freq_UK Freq_SP
  <chr>       <chr>         <int> <int> <int>   <int>   <int>
1 Feb 19 2016 #a                3     2     1       1       2
2 Feb 19 2016 #b                3     2     1       1       2
3 Feb 20 2016 #a                1     0     1       1       0
Run Code Online (Sandbox Code Playgroud)

实现的逻辑相同data.table:

library(data.table)
setDT(df)[, .(Daily_Freq = .N,
              men = sum(gender == 'M'),
              women = sum(gender == 'F'),
              Freq_UK = sum(location == 'UK'),
              Freq_SP = sum(location == 'SP'))
          , by = .(Day, hashtags)]
Run Code Online (Sandbox Code Playgroud)