我有以下数据框df
:
v1 v2 v3 v4
1 1 5 7 4
2 2 6 10 3
Run Code Online (Sandbox Code Playgroud)
我想获得以下数据帧df2
乘法列v1*v3和v2*v4:
v1 v2 v3 v4 v1v3 v2v4
1 1 5 7 4 7 20
2 2 6 10 3 20 18
Run Code Online (Sandbox Code Playgroud)
我该怎么做dplyr
呢?用mutate_each
?
我需要一个可以推广到大量变量而不仅仅是4(v1到v4)的解决方案.这是生成示例的代码:
v1 <- c(1, 2)
v2 <- c(5,6)
v3 <- c(7, 10)
v4 <- c(4, 3)
df <- data.frame(v1, v2, v3, v4)
v1v3 <- c(v1 * v3)
v2v4 <- c(v2 * v4)
df2 <- cbind(df, v1v3, v2v4)
Run Code Online (Sandbox Code Playgroud)
lee*_*sej 21
你真的很亲密.
df2 <-
df %>%
mutate(v1v3 = v1 * v3,
v2v4 = v2 * v4)
Run Code Online (Sandbox Code Playgroud)
这么简单的语言吧?
有关更多精彩技巧,请参阅此处.
编辑:感谢@Facottons指向这个答案:https://stackoverflow.com/a/34377242/5088194 ,这是一个解决这个问题的整洁方法.它使得人们不必在每个新列所需的硬编码中写入一行.虽然它比Base R方法更冗长,但逻辑至少更直接透明/可读.值得注意的是,必须存在至少一半的行,因为这种方法的列有效.
# prep the product column names (also acting as row numbers)
df <-
df %>%
mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2))
# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <-
df %>%
gather(column, value, -prod_grp) %>%
mutate(column = as.numeric(sub("v", "", column)),
pair = column - 2) %>%
mutate(pair = if_else(pair < 1, pair + 2, pair))
# summarize the products for each column
prod_df <-
tidy_df %>%
group_by(prod_grp, pair) %>%
summarize(val = prod(value)) %>%
spread(prod_grp, val) %>%
mutate(pair = paste0("v", pair, "v", pair + 2)) %>%
rename(prod_grp = pair)
# put the original frame and summary frames together
final_df <-
df %>%
left_join(prod_df) %>%
select(-prod_grp)
Run Code Online (Sandbox Code Playgroud)