我有以下内容data.frame。我想创建一个新列w(用于重量)。w对于每个给定日期具有 n 个最高回报的行业,应等于 1 / n;对于其余行业,应等于 0。我可以group_by(date)使用top_n(3, wt = return)来过滤顶级行业,然后mutate(w = 1/n),但是我怎样才能mutate不丢弃w= 0 的其他行业呢?
structure(list(date = structure(c(16556, 16556, 16556, 16556,
16556, 16556, 16556, 16556, 16556, 16556, 16587, 16587, 16587,
16587, 16587, 16587, 16587, 16587, 16587, 16587, 16617, 16617,
16617, 16617, 16617, 16617, 16617, 16617, 16617, 16617), class = "Date"),
industry = c("Hlth", "Txtls", "BusEq", "Fin", "ElcEq", "Food",
"Beer", "Books", "Cnstr", "Carry", "Clths", "Txtls", "Fin",
"Games", "Cnstr", "Meals", "Hlth", "Hshld", "Telcm", "Rtail",
"Smoke", "Games", "Clths", "Rtail", "Servs", "Meals", "Food",
"Hlth", "Beer", "Trans"), return = c(4.89, 4.37, 4.02, 2.99,
2.91, 2.03, 2, 1.95, 1.86, 1.75, 4.17, 4.09, 1.33, 1.26,
0.42, 0.29, 0.08, -0.11, -0.45, -0.48, 9.59, 6, 5.97, 5.78,
5.3, 4.15, 4.04, 3.67, 3.51, 3.27)), row.names = c(NA, -30L
), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 30 x 3
date industry return
<date> <chr> <dbl>
1 2015-05-01 Hlth 4.89
2 2015-05-01 Txtls 4.37
3 2015-05-01 BusEq 4.02
4 2015-05-01 Fin 2.99
5 2015-05-01 ElcEq 2.91
6 2015-05-01 Food 2.03
7 2015-05-01 Beer 2
8 2015-05-01 Books 1.95
9 2015-05-01 Cnstr 1.86
10 2015-05-01 Carry 1.75
# ... with 20 more rows
Run Code Online (Sandbox Code Playgroud)
编辑:你会如何处理关系?假设第三名并列。第三名的权重应分配给第三名和第四名(假设只有 2 名并列),权重为 (1/n)/2。第一名和第二名的权重保持在 1/n。
编辑:假设= 3。如果没有平局,则每个值的n前 3 个值应获得1/3 的权重。如果第三名 (T3) 并列,那么我们有 (1st, 2nd, T3, T3),我希望权重为 1/3, 1/3, 1/6, 1/6 以保持总数重量为 1。但这仅适用于第三名。(1st, T2, T2) 的权重应为 1/3, 1/3, 1/3。(T1、T1、T2、T2)的权重应为 1/3、1/3、1/6、1/6 等。A2A1w
structure(list(A1 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("A", "B"), class = "factor"), A2 = c(1, 3, 3,
4, 5, 6, 7, 8, 8)), row.names = c(NA, -9L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
的输出df应该是:
> df
A1 A2 w
1 A 1 0
2 A 3 0.1666
3 A 3 0.1666
4 A 4 0.3333
5 A 5 0.3333
6 B 6 0
7 B 7 0.3333
8 B 8 0.3333
9 B 8 0.3333
Run Code Online (Sandbox Code Playgroud)
我们可以创造一个条件ifelse。按“日期”分组后,arrange根据“日期”和“返回”按降序排列数据集,然后通过创建条件创建“w”,即如果row_number()小于“n”,则将“返回”除以'n' 否则返回 0
n <- 3
df1 %>%
group_by(date) %>%
arrange(date, -return) %>%
mutate(w = ifelse(row_number() <= n, return/n, 0))
Run Code Online (Sandbox Code Playgroud)
如果我们使用top_n,则在过滤后的数据集中创建列“w”并与原始数据连接
df1 %>%
group_by(date) %>%
top_n(return, n = 3) %>%
mutate(w = return/n()) %>%
right_join(df1) %>%
mutate(w = replace_na(w, 0))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1752 次 |
| 最近记录: |