小编use*_*545的帖子

数据框的小提琴情节

我有一个data.frame,例如:

df = data.frame(AAA=rnorm(100,1,1),BBB=rnorm(100,2,1.5),CCC=rnorm(100,1.5,1.2))
Run Code Online (Sandbox Code Playgroud)

而且我想在联合小提琴情节中绘制每一列.

这是我到目前为止的地方:

names(df)[1] = 'x'
do.call('vioplot', c(df,col="red",drawRect=FALSE))
Run Code Online (Sandbox Code Playgroud)

我接下来要做的是绘制df作为x轴标签的colnames,而不是默认的x轴标签,vioplot以及它们不会相互运行的方式.我想这可以通过df在图中扩展列或通过倾斜x轴标签来实现.但我无法弄明白.

plot r

6
推荐指数
3
解决办法
3957
查看次数

微调ggplot2的geom boxplot

我有这个data.frame

my.df = data.frame(mean = c(0.045729661,0.030416531,0.043202944,0.025600973,0.040526913,0.046167044,0.029352414,0.021477789,0.027580529,0.017614864,0.020324659,0.027547972,0.0268722,0.030804717,0.021502093,0.008342398,0.02295506,0.022386184,0.030849534,0.017291356,0.030957321,0.01871551,0.016945678,0.014143042,0.026686185,0.020877973,0.028612298,0.013227244,0.010710895,0.024460647,0.03704981,0.019832982,0.031858501,0.022194059,0.030575241,0.024632496,0.040815748,0.025595652,0.023839083,0.026474704,0.033000706,0.044125751,0.02714219,0.025724641,0.020767752,0.026480009,0.016794441,0.00709195), std.dev = c(0.007455271,0.006120299,0.008243454,0.005552582,0.006871527,0.008920899,0.007137174,0.00582671,0.007439398,0.005265133,0.006180637,0.008312494,0.006628951,0.005956211,0.008532386,0.00613411,0.005741645,0.005876588,0.006640122,0.005339993,0.008842722,0.006246828,0.005532832,0.005594483,0.007268493,0.006634795,0.008287031,0.00588119,0.004479003,0.006333063,0.00803285,0.006226441,0.009681048,0.006457784,0.006045368,0.006293256,0.008062195,0.00857954,0.008160441,0.006830088,0.008095485,0.006665062,0.007437581,0.008599525,0.008242957,0.006379928,0.007168385,0.004643819), parent.origin = c("paternal","paternal","paternal","paternal","paternal","paternal","maternal","maternal","maternal","maternal","maternal","maternal","paternal","paternal","paternal","paternal","paternal","paternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","paternal","paternal","paternal","paternal","paternal","paternal","maternal","maternal","maternal","maternal","maternal","maternal","paternal","paternal","paternal","paternal","paternal","paternal"), group = c("F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F"), replicate = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6))
Run Code Online (Sandbox Code Playgroud)

我有这个代码用于绘制ggplot2 geom_boxplot:

p1 = ggplot(data = my.df, aes(factor(replicate), color = factor(parent.origin)))
p1 = p1 + geom_boxplot(aes(fill = factor(parent.origin), width = 0.3, lower = mean - std.dev, upper = mean + std.dev, middle = mean, ymin = mean - 3*std.dev, ymax = mean + 3*std.dev), stat="identity") + facet_wrap(~group, ncol = 4)+scale_color_manual(values = c("red","blue"),labels = c("maternal","paternal"),name = "parental allele")+scale_fill_manual(values = c("red","blue"))
Run Code Online (Sandbox Code Playgroud)

产生这个情节

我的问题是:1.我想为每个盒子添加一条中心线(比如黑色)2.我想让盒子变窄,这样任何给定复制品中的蓝色和红色盒子都不会相互重叠3我想让胡须线虚线或点缀

任何的想法?

r ggplot2

6
推荐指数
2
解决办法
2万
查看次数

从数字向量中采样等距点

我有一个数字向量:

vec = c(1464.556644,552.6007169,155.4249747,1855.360016,1315.874155,2047.980206,2361.475519,4130.530507,1609.572131,4298.980363,697.6034771,312.080866,2790.738644,1116.406288,989.6391649,2683.393338,3032.080837,2462.137352,2964.362507,1182.894473,1268.968128,4495.503015,576.1063996,232.4996213,1355.256694,1336.607876,2506.458008,1242.918255,3645.587384)
Run Code Online (Sandbox Code Playgroud)

而且我想n=5从它那里采样尽可能相等的点.换句话说,我想得到vec最接近这些点的点:

seq(min(vec),max(vec),(max(vec)-min(vec))/(n-1))
Run Code Online (Sandbox Code Playgroud)

实现这一目标的最快方法是什么?

optimization r sample

6
推荐指数
1
解决办法
1239
查看次数

将条形图添加到ggplot2图例

我有以下内容data.frame:

my.df = data.frame(mean = c(0.045729661,0.030416531,0.043202944,0.025600973,0.040526913,0.046167044,0.029352414,0.021477789,0.027580529,0.017614864,0.020324659,0.027547972,0.0268722,0.030804717,0.021502093,0.008342398,0.02295506,0.022386184,0.030849534,0.017291356,0.030957321,0.01871551,0.016945678,0.014143042,0.026686185,0.020877973,0.028612298,0.013227244,0.010710895,0.024460647,0.03704981,0.019832982,0.031858501,0.022194059,0.030575241,0.024632496,0.040815748,0.025595652,0.023839083,0.026474704,0.033000706,0.044125751,0.02714219,0.025724641,0.020767752,0.026480009,0.016794441,0.00709195), std.dev = c(0.007455271,0.006120299,0.008243454,0.005552582,0.006871527,0.008920899,0.007137174,0.00582671,0.007439398,0.005265133,0.006180637,0.008312494,0.006628951,0.005956211,0.008532386,0.00613411,0.005741645,0.005876588,0.006640122,0.005339993,0.008842722,0.006246828,0.005532832,0.005594483,0.007268493,0.006634795,0.008287031,0.00588119,0.004479003,0.006333063,0.00803285,0.006226441,0.009681048,0.006457784,0.006045368,0.006293256,0.008062195,0.00857954,0.008160441,0.006830088,0.008095485,0.006665062,0.007437581,0.008599525,0.008242957,0.006379928,0.007168385,0.004643819), parent.origin = c("paternal","paternal","paternal","paternal","paternal","paternal","maternal","maternal","maternal","maternal","maternal","maternal","paternal","paternal","paternal","paternal","paternal","paternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","maternal","paternal","paternal","paternal","paternal","paternal","paternal","maternal","maternal","maternal","maternal","maternal","maternal","paternal","paternal","paternal","paternal","paternal","paternal"), group = c("F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:M","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1r:F","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:M","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F","F1i:F"), replicate = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6))
Run Code Online (Sandbox Code Playgroud)

为此我用这段代码生成一个ggplot:

p1 = ggplot(data = my.df, aes(factor(replicate), color = factor(parent.origin)))
p1 = p1 + geom_boxplot(aes(fill = factor(parent.origin),lower = mean - std.dev, upper = mean + std.dev, middle = mean, ymin = mean - 3*std.dev, ymax = mean + 3*std.dev), position = position_dodge(width = 0), width = 0.5, alpha = 0.5, stat="identity") + facet_wrap(~group, ncol = 4)+scale_fill_manual(values = c("red","blue"),labels = c("maternal","paternal"),name …
Run Code Online (Sandbox Code Playgroud)

r ggplot2

5
推荐指数
2
解决办法
341
查看次数

用户提供的参数用于使用安排对data.frame进行排序

假设我有这个data.frame:

df = data.frame(strain=c("I","I","R","R"),sex=c("M","F","M","F"),age=c("8d","8d","64d","64d"))
Run Code Online (Sandbox Code Playgroud)

以及提供data.frame定义如何订购的用户df.例如:

order.df = data.frame(column = c("age","sex","strain"),order = c("+","-","+"))
Run Code Online (Sandbox Code Playgroud)

我想用arrangeplyr包命令df根据由所定义的次序order.df.如果它不是用户提供的,我会这样做:

arrange(df, age, desc(sex), strain)
Run Code Online (Sandbox Code Playgroud)

所以我的问题是如何通过order.df实现这一目标?或者什么样的数据结构适合存储用户提供的订单定义,以便它可以与安排一起使用.

r plyr

5
推荐指数
1
解决办法
176
查看次数

从多元正态分布中有效地随机抽取

只是想知道是否有人遇到过他/她需要从非常高维多元正态分布(比如维度= 10,000)中随机抽取的问题,因为包的rmvnorm功能mvtnorm是不切实际的.

我知道这篇文章有一个包Rcppdmvnorm功能实现mvtnorm,所以我想知道是否存在等效的东西rmvnorm

r rcpp

5
推荐指数
1
解决办法
1298
查看次数

整数向量的间隔长度之和

假设我有这个整数vector:

> int.vec
 [1]  1  2  3  5  6  7 10 11 12 13
Run Code Online (Sandbox Code Playgroud)

(来自int.vec <- c(1:3,5:7,10:13))

我正在寻找一个函数,它将返回此向量中所有区间长度的总和.

所以基本上int.vec这个函数会返回:

3+3+4 = 10
Run Code Online (Sandbox Code Playgroud)

r intervals

5
推荐指数
3
解决办法
550
查看次数

Unix 循环遍历 cut 的输出

我有很多slurm工作可以生成以下格式的标准输出和错误文件:

<string>.<string>.<string>.<job_id>.ERR
Run Code Online (Sandbox Code Playgroud)

其中job_id是 分配的作业 ID slurm

因此,要获取这些工作 ID,我可以:

cut -f 4 -d "." *.ERR
Run Code Online (Sandbox Code Playgroud)

我想将此命令的输出通过管道传输到将运行的循环sacct -j <job_id>以及grep哪些作业失败,使用:

sacct -j <job_id> | grep "FAILED"
Run Code Online (Sandbox Code Playgroud)

这可以通过一个命令完成吗?

unix bash grep loops

5
推荐指数
1
解决办法
3952
查看次数

计算一组间隔之间的成对距离

假设我有一组由此矩阵表示的闭合线性区间:

interval.mat = matrix(c(1,2,3,5,4,6,8,9), byrow = TRUE, ncol = 2)
Run Code Online (Sandbox Code Playgroud)

interval.mat[,1]区间起点在哪里,interval.mat[,2]是它们对应的终点.

我正在寻找一种有效的(因为这个例子矩阵是一个玩具,实际上我的矩阵包含几千个间隔)的方式来产生一个矩阵,它将保持间隔之间的所有成对正距离.一对间隔之间的距离应该是间隔的开始,两者之间的较大端减去间隔的末端,两者之间的末端较小.例如间隔之间的距离c(1,2)c(3,5)3 - 2 = 1,由于第二间隔中的第一一个之后结束.在情况下,时间间隔重叠的距离应为0.因此,例如,在的情况下,c(3,5)c(4,6)该距离将是0.

因此,上述间隔的成对距离矩阵将是:

> matrix(c(0,1,2,6,1,0,0,3,2,0,0,2,6,3,2,0), byrow = TRUE, nrow = 4, ncol = 4)
     [,1] [,2] [,3] [,4]
[1,]    0    1    2    6
[2,]    1    0    0    3
[3,]    2    0    0    2
[4,]    6    3    2    0
Run Code Online (Sandbox Code Playgroud)

r intervals

4
推荐指数
1
解决办法
395
查看次数

在两个数据帧之间相交多列

我有两个数据框,每个数据框有2列.例如:

df.1 = data.frame(col.1 = c("a","a","a","a","b","b","b","c","c","d"), col.2 = c("b","c","d","e","c","d","e","d","e","e"))
df.2 = data.frame(col.1 = c("b","b","b","a","a","e"), col.2 = c("a","c","e","c","e","c"))
Run Code Online (Sandbox Code Playgroud)

我正在寻找一种有效的方法来查找每个col.1 col.2行对df.1的df.2中的行索引.请注意,df.1中的行对可能以相反的顺序出现在df.2中(例如df.1 [1,],即"a","b"出现在df.2 [1,]中为"b" ","一个").这对我来说无关紧要.换句话说,只要df.1中的行对以df.2中的任何顺序出现,我希望它的行索引在df.2中,否则它应该返回NA.还有一点需要注意,两个数据帧中的行对都是唯一的 - 这意味着每个行对只出现一次.

因此对于这两个数据帧,返回向量将是:

c(1,4,NA,5,2,NA,3,NA,6,NA)
Run Code Online (Sandbox Code Playgroud)

r dataframe

4
推荐指数
1
解决办法
1602
查看次数

标签 统计

r ×9

ggplot2 ×2

intervals ×2

bash ×1

dataframe ×1

grep ×1

loops ×1

optimization ×1

plot ×1

plyr ×1

rcpp ×1

sample ×1

unix ×1