我一直在尝试学习R一段时间,但还没有把我的知识提升到一个不错的水平.请帮我解决这个问题.
我有一个包含5000行的csv数据文件,其中包含以下数据字段:名称,渠道(内部或外部),调查发送日期和调查收到日期.
基础数据看起来像这样

我想以下面的格式提出这个问题

我试过这个
library("reshape2")
dcast(w, Recruiter~channel)"
Run Code Online (Sandbox Code Playgroud)
工作正常,但我不知道如何添加"调查发送","调查收到&"调查发送 - 调查收到"
dplyr 解...
> head(data)
Name Channel Sent Recd
1 A Internal 2014-07-10 2014-07-12
2 A Internal 2014-07-16 <NA>
3 A External 2014-08-04 2014-08-10
4 A Internal 2014-08-16 2014-08-18
5 A Internal 2014-07-29 <NA>
6 A External 2014-08-05 2014-08-14
Run Code Online (Sandbox Code Playgroud)
然后:
require(dplyr)
data %>%
group_by(Name) %>%
summarise(
External=sum(Channel=="External"),
Internal=sum(Channel=="Internal"),
Total=n(),
Sent=sum(!is.na(Sent)),
Recd=sum(!is.na(Recd))
) %>%
mutate(Pending=Sent-Recd)
Run Code Online (Sandbox Code Playgroud)
得到:
Name External Internal Total Sent Recd Pending
1 A 6 4 10 10 8 2
2 B 2 7 9 9 6 3
3 C 4 5 9 9 4 5
Run Code Online (Sandbox Code Playgroud)
注意我已经使用真实Date对象作为日期和NA缺失数据.
这样生成的数据:
data =
structure(list(Name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
Channel = c("Internal", "Internal", "External", "Internal",
"Internal", "External", "External", "External", "External",
"External", "Internal", "External", "Internal", "Internal",
"Internal", "External", "Internal", "Internal", "Internal",
"Internal", "Internal", "External", "Internal", "External",
"External", "External", "Internal", "Internal"), Sent = structure(c(16261,
16267, 16286, 16298, 16280, 16287, 16294, 16292, 16291, 16282,
16304, 16297, 16262, 16274, 16264, 16270, 16252, 16276, 16279,
16275, 16277, 16293, 16253, 16272, 16288, 16283, 16281, 16296
), class = "Date"), Recd = structure(c(16263.5024573486,
NA, 16292.4899729695, 16300.3446546271, NA, 16296.9054549634,
16301.318120582, 16301.4672047794, 16295.238142278, 16286.8117301762,
NA, 16306.6499495078, NA, 16282.0412430186, 16272.4275530744,
16273.9005153924, 16255.7532094959, NA, 16284.9287535194,
NA, 16279.182732366, 16302.4864703286, NA, NA, 16296.6838856321,
NA, 16290.3657759354, NA), class = "Date")), .Names = c("Name",
"Channel", "Sent", "Recd"), row.names = c(NA, -28L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)