Tha*_*ess 3 r dplyr magrittr tidyr
我有一个这样的数据框:
message.id sender recipient
1 1 A B
2 1 A C
3 2 A B
4 3 B C
5 3 B D
6 3 B Q
Run Code Online (Sandbox Code Playgroud)
我想通过发送方和收件人列中的值计数来总结它,以获得:
address messages.sent messages.received
1 A 3 0
2 B 3 2
3 C 0 2
4 D 0 1
5 Q 0 1
Run Code Online (Sandbox Code Playgroud)
我有工作代码,但它很混乱,我希望有一种方法可以在一个magrittr链中完成所有操作而不是我在下面的操作:
df <- data.frame(message.id = c(1,1,2,3,3,3),
sender = c("A","A","A","B","B","B"),
recipient = c("B","C","B","C","D","Q"))
sent <- df %>%
group_by(sender) %>%
summarise(messages.sent = n()) %>%
mutate(address = sender) %>%
select(address, messages.sent)
received <- df %>%
group_by(recipient) %>%
summarise(messages.received = n()) %>%
mutate(address = recipient) %>%
select(address, messages.received)
df_summary <- merge(sent, received, all = TRUE) %>%
replace(is.na(.), 0)
Run Code Online (Sandbox Code Playgroud)
我们可以用 melt/dcast
library(reshape2)
dcast(melt(df1, id.var='message.id'), value~variable,
value.var='message.id', length)
Run Code Online (Sandbox Code Playgroud)
或者使用包装器 recast
recast(df1, id.var='message.id', value~variable, length)
# value sender recipient
#1 A 3 0
#2 B 3 2
#3 C 0 2
#4 D 0 1
#5 Q 0 1
Run Code Online (Sandbox Code Playgroud)
如果我们需要使用 dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df1, messages, address, 2:3) %>%
group_by(messages, address) %>%
summarise(n=n()) %>%
spread(messages, n, fill=0)
# address sender recipient
# (chr) (dbl) (dbl)
#1 A 3 0
#2 B 3 2
#3 C 0 2
#4 D 0 1
#5 Q 0 1
Run Code Online (Sandbox Code Playgroud)