改变前 n 行而不丢弃其他行

Question

改变前 n 行而不丢弃其他行

我有以下内容data.frame。我想创建一个新列w（用于重量）。w对于每个给定日期具有 n 个最高回报的行业，应等于 1 / n；对于其余行业，应等于 0。我可以group_by(date)使用top_n(3, wt = return)来过滤顶级行业，然后mutate(w = 1/n)，但是我怎样才能mutate不丢弃w= 0 的其他行业呢？

structure(list(date = structure(c(16556, 16556, 16556, 16556, 
16556, 16556, 16556, 16556, 16556, 16556, 16587, 16587, 16587, 
16587, 16587, 16587, 16587, 16587, 16587, 16587, 16617, 16617, 
16617, 16617, 16617, 16617, 16617, 16617, 16617, 16617), class = "Date"), 
    industry = c("Hlth", "Txtls", "BusEq", "Fin", "ElcEq", "Food", 
    "Beer", "Books", "Cnstr", "Carry", "Clths", "Txtls", "Fin", 
    "Games", "Cnstr", "Meals", "Hlth", "Hshld", "Telcm", "Rtail", 
    "Smoke", "Games", "Clths", "Rtail", "Servs", "Meals", "Food", 
    "Hlth", "Beer", "Trans"), return = c(4.89, 4.37, 4.02, 2.99, 
    2.91, 2.03, 2, 1.95, 1.86, 1.75, 4.17, 4.09, 1.33, 1.26, 
    0.42, 0.29, 0.08, -0.11, -0.45, -0.48, 9.59, 6, 5.97, 5.78, 
    5.3, 4.15, 4.04, 3.67, 3.51, 3.27)), row.names = c(NA, -30L
), class = c("tbl_df", "tbl", "data.frame"))

# A tibble: 30 x 3
   date       industry return
   <date>     <chr>     <dbl>
 1 2015-05-01 Hlth       4.89
 2 2015-05-01 Txtls      4.37
 3 2015-05-01 BusEq      4.02
 4 2015-05-01 Fin        2.99
 5 2015-05-01 ElcEq      2.91
 6 2015-05-01 Food       2.03
 7 2015-05-01 Beer       2   
 8 2015-05-01 Books      1.95
 9 2015-05-01 Cnstr      1.86
10 2015-05-01 Carry      1.75
# ... with 20 more rows

Run Code Online (Sandbox Code Playgroud)

编辑：你会如何处理关系？假设第三名并列。第三名的权重应分配给第三名和第四名（假设只有 2 名并列），权重为 (1/n)/2。第一名和第二名的权重保持在 1/n。

编辑：假设= 3。如果没有平局，则每个值的n前 3 个值应获得1/3 的权重。如果第三名 (T3) 并列，那么我们有 (1st, 2nd, T3, T3)，我希望权重为 1/3, 1/3, 1/6, 1/6 以保持总数重量为 1。但这仅适用于第三名。(1st, T2, T2) 的权重应为 1/3, 1/3, 1/3。（T1、T1、T2、T2）的权重应为 1/3、1/3、1/6、1/6 等。A2A1w

structure(list(A1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), A2 = c(1, 3, 3, 4, 5, 6, 7, 8, 8)), row.names = c(NA, -9L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
的输出df应该是：

> df A1 A2 w 1 A 1 0 2 A 3 0.1666 3 A 3 0.1666 4 A 4 0.3333 5 A 5 0.3333 6 B 6 0 7 B 7 0.3333 8 B 8 0.3333 9 B 8 0.3333
Run Code Online (Sandbox Code Playgroud)

Answer 1

akr*_*run 5

我们可以创造一个条件ifelse。按“日期”分组后，arrange根据“日期”和“返回”按降序排列数据集，然后通过创建条件创建“w”，即如果row_number()小于“n”，则将“返回”除以'n' 否则返回 0

n <- 3
df1 %>%
   group_by(date) %>%
   arrange(date, -return) %>% 
   mutate(w = ifelse(row_number() <= n, return/n, 0))

Run Code Online (Sandbox Code Playgroud)

如果我们使用top_n，则在过滤后的数据集中创建列“w”并与原始数据连接

df1 %>% 
  group_by(date) %>% 
  top_n(return, n = 3) %>% 
  mutate(w = return/n()) %>% 
  right_join(df1)  %>% 
  mutate(w = replace_na(w, 0))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，5 月前
查看次数：	1752 次
最近记录：	7 年，5 月前