R编程:plyr如何使用ddply计算列中的值

Ria*_*iad 7 r plyr

我想总结一下我的数据的通过/失败状态,如下所示.换句话说,我想告诉每种产品/类型的通过和失败案例的数量.

library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)
Run Code Online (Sandbox Code Playgroud)

以下cmd返回传递+失败案例的总数,但我想要传递和失败的单独列

dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))
Run Code Online (Sandbox Code Playgroud)

结果是:

        product type N
 1      p1      t1   6
 2      p1      t2   6
 3      p2      t1   6
 4      p2      t2   6
Run Code Online (Sandbox Code Playgroud)

希望的结果是

         product type Pass Fail
 1       p1      t1   5    1
 2       p1      t2   3    3
 3       p2      t1   4    2
 4       p2      t2   3    3
Run Code Online (Sandbox Code Playgroud)

我尝试过这样的事情:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )
Run Code Online (Sandbox Code Playgroud)

但显然这是错误的,因为结果是失败和传递的重要结果.

提前感谢您的建议!此致,里亚德.

ial*_*alm 11

尝试:

dfSummary <- ddply(df, c("product", "type"), summarise, 
                   Pass=sum(result=="pass"), Fail=sum(result=="fail") )
Run Code Online (Sandbox Code Playgroud)

这给了我结果:

  product type Pass Fail
1      p1   t1    5    1
2      p1   t2    3    3
3      p2   t1    4    2
4      p2   t2    3    3
Run Code Online (Sandbox Code Playgroud)

说明:

  1. 您正在df为该ddply功能提供数据集.
  2. ddply 分裂变量,"产品"和"类型"
    • 这导致在两个变量的每个组合上分割的length(unique(product)) * length(unique(type))片段(即数据的子集df).
  3. 对于每个部分,ddply应用您提供的某些功能.在这种情况下,你计算的数量result=="pass"result=="fail".
  4. 现在ddply留下每个部分的一些结果,即您分割的变量(产品和类型)和您请求的结果(通过和失败).
  5. 它将所有部分组合在一起并将其返回