Group items and find the top n% from each group

Question

Group items and find the top n% from each group

I have data in the below format. I've to find the sum of values of the top 25% in each group - grouped by ID.

If the number corresponding to the top 25% in a group is a fraction, give the data for the ceil(higher number) of that fraction

Run Code Online (Sandbox Code Playgroud)

While I was able to get the top n rows from using the below code, I've not figured a way to get the top n% in each group

p2<-group_by(data, ID) %>%
mutate(rank = rank(desc(Value))) %>%
arrange(rank) %>%
filter(rank <= 3)

Run Code Online (Sandbox Code Playgroud)

The expected result looks something like this. There are 4 rows in the data with ID 1 - top 25% means 1 row.

There are 6 rows with ID 2 - ceil(0.25*6) is 2 - so top 2 rows. Similarly, for ID 3, there are 11 rows - ceil(0.25*11) is top 3 rows under ID 3.

Please let me know how to get this!

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ron*_*hah 6

We can arrange the dataframe in descending order of Value and then slice the number of rows based on the ratio for each group.

library(dplyr)

df %>%
  arrange(desc(Value)) %>%
  group_by(ID) %>%
  slice(1:ceiling(0.25 * n()))

#     ID Value
#  <int> <int>
#1     1    45
#2     2    97
#3     2    89
#4     3    83
#5     3    72
#6     3    63

Run Code Online (Sandbox Code Playgroud)

We can also do this in base R. First order the rows based on Value, split the rows based on ID and then select top n rows from each group.

new_df <- df[order(df$Value, decreasing = TRUE), ]
new_df[unlist(Map(head, split(1:nrow(new_df), new_df$ID), 
                  ceiling(table(new_df$ID) * 0.25))), ]

#   ID Value
#14  1    45
#17  2    97
#18  2    89
#2   3    83
#19  3    72
#6   3    63

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，6 月前
查看次数：	52 次
最近记录：	6 年，6 月前