改变前 n 行而不丢弃其他行

cpa*_*age 1 r dplyr tibble

我有以下内容data.frame。我想创建一个新列w(用于重量)。w对于每个给定日期具有 n 个最高回报的行业,应等于 1 / n;对于其余行业,应等于 0。我可以group_by(date)使用top_n(3, wt = return)来过滤顶级行业,然后mutate(w = 1/n),但是我怎样才能mutate不丢弃w= 0 的其他行业呢?

structure(list(date = structure(c(16556, 16556, 16556, 16556, 
16556, 16556, 16556, 16556, 16556, 16556, 16587, 16587, 16587, 
16587, 16587, 16587, 16587, 16587, 16587, 16587, 16617, 16617, 
16617, 16617, 16617, 16617, 16617, 16617, 16617, 16617), class = "Date"), 
    industry = c("Hlth", "Txtls", "BusEq", "Fin", "ElcEq", "Food", 
    "Beer", "Books", "Cnstr", "Carry", "Clths", "Txtls", "Fin", 
    "Games", "Cnstr", "Meals", "Hlth", "Hshld", "Telcm", "Rtail", 
    "Smoke", "Games", "Clths", "Rtail", "Servs", "Meals", "Food", 
    "Hlth", "Beer", "Trans"), return = c(4.89, 4.37, 4.02, 2.99, 
    2.91, 2.03, 2, 1.95, 1.86, 1.75, 4.17, 4.09, 1.33, 1.26, 
    0.42, 0.29, 0.08, -0.11, -0.45, -0.48, 9.59, 6, 5.97, 5.78, 
    5.3, 4.15, 4.04, 3.67, 3.51, 3.27)), row.names = c(NA, -30L
), class = c("tbl_df", "tbl", "data.frame"))

# A tibble: 30 x 3
   date       industry return
   <date>     <chr>     <dbl>
 1 2015-05-01 Hlth       4.89
 2 2015-05-01 Txtls      4.37
 3 2015-05-01 BusEq      4.02
 4 2015-05-01 Fin        2.99
 5 2015-05-01 ElcEq      2.91
 6 2015-05-01 Food       2.03
 7 2015-05-01 Beer       2   
 8 2015-05-01 Books      1.95
 9 2015-05-01 Cnstr      1.86
10 2015-05-01 Carry      1.75
# ... with 20 more rows
Run Code Online (Sandbox Code Playgroud)

编辑:你会如何处理关系?假设第三名并列。第三名的权重应分配给第三名和第四名(假设只有 2 名并列),权重为 (1/n)/2。第一名和第二名的权重保持在 1/n。

编辑:假设= 3。如果没有平局,则每个值的n前 3 个值应获得1/3 的权重。如果第三名 (T3) 并列,那么我们有 (1st, 2nd, T3, T3),我希望权重为 1/3, 1/3, 1/6, 1/6 以保持总数重量为 1。但这仅适用于第三名。(1st, T2, T2) 的权重应为 1/3, 1/3, 1/3。(T1、T1、T2、T2)的权重应为 1/3、1/3、1/6、1/6 等。A2A1w

structure(list(A1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L), .Label = c("A", "B"), class = "factor"), A2 = c(1, 3, 3, 
    4, 5, 6, 7, 8, 8)), row.names = c(NA, -9L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

的输出df应该是:

> df
  A1 A2  w
1  A  1  0 
2  A  3  0.1666
3  A  3  0.1666 
4  A  4  0.3333
5  A  5  0.3333
6  B  6  0
7  B  7  0.3333
8  B  8  0.3333
9  B  8  0.3333
Run Code Online (Sandbox Code Playgroud)

akr*_*run 5

我们可以创造一个条件ifelse。按“日期”分组后,arrange根据“日期”和“返回”按降序排列数据集,然后通过创建条件创建“w”,即如果row_number()小于“n”,则将“返回”除以'n' 否则返回 0

n <- 3
df1 %>%
   group_by(date) %>%
   arrange(date, -return) %>% 
   mutate(w = ifelse(row_number() <= n, return/n, 0))
Run Code Online (Sandbox Code Playgroud)

如果我们使用top_n,则在过滤后的数据集中创建列“w”并与原始数据连接

df1 %>% 
  group_by(date) %>% 
  top_n(return, n = 3) %>% 
  mutate(w = return/n()) %>% 
  right_join(df1)  %>% 
  mutate(w = replace_na(w, 0))
Run Code Online (Sandbox Code Playgroud)