如何按除某些变量之外的所有变量进行分组,并为每个观察值添加一个组 ID

akr*_*ica 2 r dplyr

我有一个这样的数据集:

data(CO2, package = 'datasets')

##    Plant        Type  Treatment conc uptake
## 1    Qn1      Quebec nonchilled   95   16.0
## 2    Qn1      Quebec nonchilled  175   30.4
## ... 
## 17   Qn3      Quebec nonchilled  250   40.3
## 18   Qn3      Quebec nonchilled  350   42.1
## ...
## 27   Qc1      Quebec    chilled  675   35.4
## 28   Qc1      Quebec    chilled 1000   38.7
## ...
## 36   Qc3      Quebec    chilled   95   15.1
## 37   Qc3      Quebec    chilled  175   21.0
## ...
## 47   Mn1 Mississippi nonchilled  500   30.9
##  ...
## 53   Mn2 Mississippi nonchilled  350   31.8
## 54   Mn2 Mississippi nonchilled  500   32.4
## ...
## 62   Mn3 Mississippi nonchilled  675   28.1
## 63   Mn3 Mississippi nonchilled 1000   27.8
## ...
## 70   Mc1 Mississippi    chilled 1000   21.9
## 71   Mc2 Mississippi    chilled   95    7.7
## 72   Mc2 Mississippi    chilled  175   11.4
## ...
## 83   Mc3 Mississippi    chilled  675   18.9
## 84   Mc3 Mississippi    chilled 1000   19.9
Run Code Online (Sandbox Code Playgroud)
  • 观测值应根据 conc之外所有变量的组合进行分组uptake。所以我想指定变量,我希望使用分组
  • 我想向GroupID数据集中添加一个新变量,其中属于同一组的所有观察值都具有相同的值GroupID

我找到了一个可行的解决方案,但它是一个庞然大物:

library(dplyr)
CO2 %>% 
  mutate(GroupID=
         do.call( group_indices
                , c( list(.data=.)
                   , colnames(.) %>% 
                      setdiff(c('conc','uptake')) %>% 
                      as.name()
                   )
                )
         )

##    Plant        Type  Treatment conc uptake GroupID
## 1    Qn1      Quebec nonchilled   95   16.0       1
## 2    Qn1      Quebec nonchilled  175   30.4       1
## ...
## 8    Qn2      Quebec nonchilled   95   13.6       2
## 9    Qn2      Quebec nonchilled  175   27.3       2
## ...
## 15   Qn3      Quebec nonchilled   95   16.2       3
## 16   Qn3      Quebec nonchilled  175   32.4       3
## ...
## 22   Qc1      Quebec    chilled   95   14.2       4
## 23   Qc1      Quebec    chilled  175   24.1       4
## ...
## 29   Qc2      Quebec    chilled   95    9.3       6
## 30   Qc2      Quebec    chilled  175   27.3       6
## ...
## 36   Qc3      Quebec    chilled   95   15.1       5
## 37   Qc3      Quebec    chilled  175   21.0       5
## ...
## 43   Mn1 Mississippi nonchilled   95   10.6       9
## 44   Mn1 Mississippi nonchilled  175   19.2       9
## ...
Run Code Online (Sandbox Code Playgroud)

有更简单的解决方案吗?


奖励:如果有一个解决方案可以使用相同类型的所有变量(例如所有因子变量)进行分组,那将是一个爆炸。

www*_*www 5

我们可以使用group_by_if基于条件对变量进行分组。在这种情况下,is.factor是评估列是否为因子。之后,group_indices可以为每个组生成 ID。

library(dplyr)

CO2_2 <- CO2 %>%
  mutate(GroupID = CO2 %>%
           group_by_if(is.factor) %>%
           group_indices())
head(CO2_2)
#   Plant   Type  Treatment conc uptake GroupID
# 1   Qn1 Quebec nonchilled   95   16.0       1
# 2   Qn1 Quebec nonchilled  175   30.4       1
# 3   Qn1 Quebec nonchilled  250   34.8       1
# 4   Qn1 Quebec nonchilled  350   37.2       1
# 5   Qn1 Quebec nonchilled  500   35.3       1
# 6   Qn1 Quebec nonchilled  675   39.2       1
Run Code Online (Sandbox Code Playgroud)

我们还可以使用group_by_at基于列名对数据框进行分组。

CO2_3 <- CO2 %>%
  mutate(GroupID = CO2 %>%
           group_by_at(vars(-conc, -uptake)) %>%
           group_indices())
head(CO2_3)
#   Plant   Type  Treatment conc uptake GroupID
# 1   Qn1 Quebec nonchilled   95   16.0       1
# 2   Qn1 Quebec nonchilled  175   30.4       1
# 3   Qn1 Quebec nonchilled  250   34.8       1
# 4   Qn1 Quebec nonchilled  350   37.2       1
# 5   Qn1 Quebec nonchilled  500   35.3       1
# 6   Qn1 Quebec nonchilled  675   39.2       1
Run Code Online (Sandbox Code Playgroud)