根据两列分配唯一ID

Qui*_*tic 10 r multiple-columns

我有一个如下所示的数据帧(df):

School Student  Year  
A         10    1999
A         10    2000
A         20    1999
A         20    2000
A         20    2001
B         10    1999
B         10    2000
Run Code Online (Sandbox Code Playgroud)

我想创建一个人ID专栏,以便df看起来像这样:

ID School Student  Year  
1   A         10    1999
1   A         10    2000
2   A         20    1999
2   A         20    2000
2   A         20    2001
3   B         10    1999
3   B         10    2000
Run Code Online (Sandbox Code Playgroud)

换句话说,ID变量指示它在数据集中的哪个人,同时考虑学生编号和学校会员资格(这里我们总共有3个学生).

df$ID <- df$Student如果c("School", "Student)是唯一的,我做了并试图请求值+1 .它不起作用.帮助赞赏.

akr*_*run 12

我们可以base R通过操作无需任何组来完成此操作

df$ID <- cumsum(!duplicated(df[1:2]))
df
#   School Student Year ID
#1      A      10 1999  1
#2      A      10 2000  1
#3      A      20 1999  2
#4      A      20 2000  2
#5      A      20 2001  2
#6      B      10 1999  3
#7      B      10 2000  3
Run Code Online (Sandbox Code Playgroud)

注意:假设订购了"学校"和"学生"


或使用 tidyverse

library(dplyr)
df %>% 
    mutate(ID = group_indices_(df, .dots=c("School", "Student"))) 
#  School Student Year ID
#1      A      10 1999  1
#2      A      10 2000  1
#3      A      20 1999  2
#4      A      20 2000  2
#5      A      20 2001  2
#6      B      10 1999  3
#7      B      10 2000  3
Run Code Online (Sandbox Code Playgroud)

  • 我做了第一个,但不得不把它写成 cumsum(!duplicated(df$1,df$2)) 才能让它工作。谢谢! (2认同)
  • `group_indices_()` 已弃用。现在应该是`mutate(ID = group_indices(df, School, Student))`吗? (2认同)

Sat*_*ish 7

按学校和学生分组,然后将组ID分配给ID变量。

library('data.table')
df[, ID := .GRP, by = .(School, Student)]

#    School Student Year ID
# 1:      A      10 1999  1
# 2:      A      10 2000  1
# 3:      A      20 1999  2
# 4:      A      20 2000  2
# 5:      A      20 2001  2
# 6:      B      10 1999  3
# 7:      B      10 2000  3
Run Code Online (Sandbox Code Playgroud)

数据:

df <- fread('School Student  Year  
A         10    1999
      A         10    2000
      A         20    1999
      A         20    2000
      A         20    2001
      B         10    1999
      B         10    2000')
Run Code Online (Sandbox Code Playgroud)