Eli*_*eth 3 aggregate r summary boxplot
我根据几年小时数据的数据框(球)进行昼夜风速循环.我想按季节绘制它们,所以我将我需要的日期分类并按照以下方式加入它们:
b8 = subset(ball, as.Date(date)>="2008-09-01 00:00:00, GMT" & as.Date(date)<= "2008-11-30 23:00:00, GMT" )
b9 = subset(ball, as.Date(date)>="2009-09-01 00:00:00, GMT" & as.Date(date)<= "2009-11-30 23:00:00, GMT" )
b10 = subset(ball, as.Date(date)>="2010-09-01 00:00:00, GMT" & as.Date(date)<= "2010-11-30 23:00:00, GMT")
ballspr = rbind(b8,b9,b10)
Run Code Online (Sandbox Code Playgroud)
然后我用这个得到一个昼夜循环:
sprwsdiurnal <- aggregate(ballspr["ws"], format(ballspr["date"],"%H"),summary, na.rm=T)
Run Code Online (Sandbox Code Playgroud)
对于四个季节中的三个季节,这使得具有这种结构的对象:
date ws
1 00 0.200, 1.000, 1.600, 2.021, 2.500, 8.000, 5.000
2 01 0.100, 1.000, 1.600, 1.988, 2.500, 8.600, 1.000
3 02 0.100, 1.000, 1.700, 1.982, 2.600, 8.900, 1.000
Run Code Online (Sandbox Code Playgroud)
......到24小时......
23 22 0.100, 1.200, 1.800, 2.222, 2.950, 9.100, 1.000
24 23 0.100, 1.000, 1.600, 2.072, 2.700, 8.800, 1.000
Run Code Online (Sandbox Code Playgroud)
这就是我想要的,因为boxplot将用于此:
par( mar = c(5, 5, 2, 2))
boxplot(sprwsdiurnal$ws, col="dodger blue",pch=16,font.lab=2,cex.lab=1.5,cex.axis=2,xlab="Hour",range=0, ylab=quote(Windspeed ~ "(" * m ~ s ^-1 * ")"),xaxt="n",main="Spring")
axis(1, at=seq(1,24, by=1),labels=seq(1,24, by=1),cex.axis=1.5, cex.lab=1.5, font.lab=2)
Run Code Online (Sandbox Code Playgroud)
问题是一个季节就像这样:
date ws.Min. ws.1st Qu. ws.Median ws.Mean ws.3rd Qu. ws.Max. ws.NA's
1 00 0.000 1.300 2.100 2.539 3.200 10.500 2.000
2 01 0.100 1.275 2.100 2.499 3.200 9.800 2.000
3 02 0.200 1.200 2.000 2.514 3.400 9.000 2.000
Run Code Online (Sandbox Code Playgroud)
......到24小时......
23 22 0.100 1.200 1.950 2.582 3.325 11.900 2.000
24 23 0.100 1.300 2.000 2.585 3.400 11.200 2.000
Run Code Online (Sandbox Code Playgroud)
Boxplot不适用于此格式.我无法解释为什么会发生这种情况,因为每个季节的所有代码都是相同的,并且它们是从同一数据帧中进行子集化的.为什么一个出来的不同?任何想法都赞赏.
编辑:这是数据.我已经检查了这两个季节,他们仍然给出了上面显示的两种不同的格式.
https://www.dropbox.com/s/v5kss0bgjyhrtw1/ball.csv
ball=read.csv("ball.csv", header=T)
ball$date = as.POSIXct(strptime(ball$date, format = "%Y-%m-%d %H:%M:%S", "GMT"))
win9 = subset(ball, as.Date(date)>="2009-06-01 00:00:00, GMT" & as.Date(date)<= "2009-08-31 23:00:00, GMT" )
aut9 = subset(ball, as.Date(date)>="2009-03-01 00:00:00, GMT" & as.Date(date)<= "2009-05-31 23:00:00, GMT" )
spr9 = subset(ball, as.Date(date)>="2009-09-01 00:00:00, GMT" & as.Date(date)<= "2009-11-30 23:00:00, GMT" )
sum9 = subset(ball, as.Date(date)>="2008-12-01 00:00:00, GMT" & as.Date(date)<= "2009-02-28 23:00:00, GMT" )
sprdiurnal <- aggregate(spr9["ws"], format(spr9["date"],"%H"),summary, na.rm=T)
par( mar = c(5, 5, 4, 2))
boxplot(sprdiurnal$ws, col=colours()[109],pch=16,cex.lab=1.5,cex.axis=1.5,xlab="Hour",range=0, ylab=quote(Wind ~ speed ~ "(" * m * "s" ^-1 * ")"),xaxt="n",main="")
axis(1, at=seq(1,24, by=1),labels=seq(1,24, by=1),cex.axis=1.5, cex.lab=1.5)
windiurnal <- aggregate(win9["ws"], format(win9["date"],"%H"),summary, na.rm=T)
par( mar = c(5, 5, 4, 2))
boxplot(windiurnal$ws, col=colours()[109],pch=16,cex.lab=1.5,cex.axis=1.5,xlab="Hour",range=0, ylab=quote(Wind ~ speed ~ "(" * m * "s" ^-1 * ")"),xaxt="n",main="")
axis(1, at=seq(1,24, by=1),labels=seq(1,24, by=1),cex.axis=1.5, cex.lab=1.5)
Run Code Online (Sandbox Code Playgroud)
据我所知,"问题"是summary你的aggregate函数" sprdiurnal"的结果导致一个矩形数据集,R存储为a matrix,而对于你的其他子集,因为几个小时包括,NA而其他不包括数据集不是矩形,因此R将摘要存储为list.
我将使用"虹膜"数据集进行演示,但首先,我还将创建一个具有一个NA值的"iris_2"数据集.
iris_2 <- iris
iris_2$Sepal.Length[10] <- NA
Run Code Online (Sandbox Code Playgroud)
让我们比较聚合输出,在这些情况下,它只是第二列.你会看到没有缺失值的"虹膜"数据集返回一个矩形矩阵作为你的第二个"列" data.frame.由于我们的一个NA值,"iris_2"数据集会被存储为a list,这是您想要的特定目的.
(irisagg <- aggregate(iris["Sepal.Length"], iris["Species"], summary))[[2]]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# [1,] 4.3 4.800 5.0 5.006 5.2 5.8
# [2,] 4.9 5.600 5.9 5.936 6.3 7.0
# [3,] 4.9 6.225 6.5 6.588 6.9 7.9
(iris_2agg <- aggregate(iris_2["Sepal.Length"], iris_2["Species"], summary))[[2]]
# $`0`
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# 4.300 4.800 5.000 5.008 5.200 5.800 1
#
# $`1`
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.900 5.600 5.900 5.936 6.300 7.000
#
# $`2`
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.900 6.225 6.500 6.588 6.900 7.900
Run Code Online (Sandbox Code Playgroud)
以下是我们如何将其重新列入清单.
irisagg$Summary <- unlist(apply(irisagg[[2]], 1, list), recursive = FALSE)
irisagg$Summary
# [[1]]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.300 4.800 5.000 5.006 5.200 5.800
#
# [[2]]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.900 5.600 5.900 5.936 6.300 7.000
#
# [[3]]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.900 6.225 6.500 6.588 6.900 7.900
Run Code Online (Sandbox Code Playgroud)
当然,一个更直接的方法是利用simplify论证aggregate并做:
(iris_3agg <- aggregate(iris["Sepal.Length"],
iris["Species"], summary,
simplify = FALSE))[[2]]
# $`0`
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.300 4.800 5.000 5.006 5.200 5.800
#
# $`1`
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.900 5.600 5.900 5.936 6.300 7.000
#
# $`2`
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.900 6.225 6.500 6.588 6.900 7.900
Run Code Online (Sandbox Code Playgroud)
将它应用于您的示例,"sprdiurnal"是给您带来麻烦的子集.单独查看sprdiurnal$ws并验证它是一个矩阵.我们将其转换为列表.
sprdiurnal$ws2 <- unlist(apply(sprdiurnal$ws, 1, list), recursive=FALSE)
Run Code Online (Sandbox Code Playgroud)
现在你可以继续boxplot像其他季节一样.
boxplot(sprdiurnal$ws2, e..t..c...)
Run Code Online (Sandbox Code Playgroud)
或者,sprdiurnal使用以下方法重新制作对象:
sprdiurnal <- aggregate(spr9["ws"],
format(spr9["date"],"%H"),
summary, na.rm = TRUE,
simplify = FALSE)
Run Code Online (Sandbox Code Playgroud)
和以前一样继续.