我想使用mutate来计算使用二项分布的列.
我有以下示例:
library("dplyr")
d = data.frame(ref = rbinom(100,100,0.5))
d$coverage = 100
d$prob = 0.5
d$eprob= d$ref / d$coverage
d = tbl_df(d)
mutate(d,
ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(1, coverage, eprob),
ref3=rbinom(1, cov1, eprob1)
)
Run Code Online (Sandbox Code Playgroud)
结果是这样的:
Source: local data frame [100 x 9]
ref coverage prob eprob ref1 cov1 eprob1 ref2 ref3
1 52 100 0.5 0.52 52 100 0.52 45 44
2 50 100 0.5 0.50 50 100 0.50 45 44
3 45 100 0.5 0.45 45 100 0.45 45 44
4 45 100 0.5 0.45 45 100 0.45 45 44
5 47 100 0.5 0.47 47 100 0.47 45 44
6 46 100 0.5 0.46 46 100 0.46 45 44
7 50 100 0.5 0.50 50 100 0.50 45 44
8 53 100 0.5 0.53 53 100 0.53 45 44
9 44 100 0.5 0.44 44 100 0.44 45 44
10 56 100 0.5 0.56 56 100 0.56 45 44
Run Code Online (Sandbox Code Playgroud)
我不明白 - 我希望mutate函数返回从ref和coverage("ref2")给出的二项分布中得出的随机数...
Mutate正确读取列 - 但是在调用rbinom时发生了一些奇怪的事情......
任何帮助我赞赏.
Ale*_*lex 15
试着改变n的rbinom:
mutate(d,
ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(100, coverage, eprob),
ref3=rbinom(100, cov1, eprob1)
)
Run Code Online (Sandbox Code Playgroud)
或者更一般地说:
mutate(d,
ref1= ref,
cov1 = coverage,
eprob1 = eprob,
ref2=rbinom(n(), coverage, eprob),
ref3=rbinom(n(), cov1, eprob1)
)
Run Code Online (Sandbox Code Playgroud)