我想转置类似于my.data下面的数据集,然后对行进行求和.
my.data <- "landuse units year county.a county.b county.c county.d
apple acres 2010 0 2 4 6
pear acres 2010 10 20 30 40
peach acres 2010 500 400 300 200"
my.data2 <- read.table(textConnection(my.data), header = T)
my.data2
Run Code Online (Sandbox Code Playgroud)
所需的输出是:
counties all.fruit
county.a 510
county.b 422
county.c 334
county.d 246
Run Code Online (Sandbox Code Playgroud)
我可以使用下面的代码执行此操作.但是,以下代码似乎必须是巨大的矫枉过正.我希望有一个更简单的解决方案.
# transpose the data set
tmy.data2 <- t(my.data2)
tmy.data2 <- as.data.frame(tmy.data2)
# assign row names to the data set
my.rows <- row.names(tmy.data2)
transposed.data <- cbind(my.rows, tmy.data2)
transposed.data
# extract numbers to obtain row sums
fruit.data <- as.data.frame(transposed.data[4:dim(transposed.data)[1], 2:dim(transposed.data)[2]])
fruit.data2 <- as.matrix(fruit.data)
fruit.data3 <- matrix(as.numeric(fruit.data2), nrow=( dim(fruit.data2)[1] ), byrow=F)
# sum fruit by county
all.fruit <- rowSums(fruit.data3, na.rm=T)
# create row names for summed fruit data
counties <- my.rows[4:length(my.rows)]
almost.final.data <- cbind(counties, all.fruit)
really.final.data <- as.data.frame(almost.final.data)
really.final.data[,2] <- as.numeric(as.character(really.final.data[,2]))
really.final.data
str(really.final.data)
Run Code Online (Sandbox Code Playgroud)
谢谢你的任何建议.我可以使用上面的代码,但将此请求视为一个大大改进我的编程的机会.
你为什么不能只添加列?
colSums(my.data2[, 4:7])
Run Code Online (Sandbox Code Playgroud)
要么
library(plyr)
numcolwise(sum)(my.data2)
year county.a county.b county.c county.d
1 6030 510 422 334 246
>
Run Code Online (Sandbox Code Playgroud)
也就是说,如果你想重新组织,有很多选择.该reshape2包提供了令人愉快的语法
library(reshape2)
> my.data.melt <- melt(my.data2, id.vars=c('units', 'year', 'landuse'))
> my.data.melt
units year landuse variable value
1 acres 2010 apple county.a 0
2 acres 2010 pear county.a 10
3 acres 2010 peach county.a 500
4 acres 2010 apple county.b 2
5 acres 2010 pear county.b 20
6 acres 2010 peach county.b 400
7 acres 2010 apple county.c 4
8 acres 2010 pear county.c 30
9 acres 2010 peach county.c 300
10 acres 2010 apple county.d 6
11 acres 2010 pear county.d 40
12 acres 2010 peach county.d 200
Run Code Online (Sandbox Code Playgroud)
然后我会用plyr:
> library(plyr)
> ddply(my.data.melt, .(variable), summarise, all.fruit=sum(value))
variable all.fruit
1 county.a 510
2 county.b 422
3 county.c 334
4 county.d 246
>
Run Code Online (Sandbox Code Playgroud)
您也可以使用base R aggregate或data.table包来完成此操作.
> library(data.table)
> my.data.melt <- as.data.table(melt(my.data2, id.vars=c('units', 'year', 'landuse')))
> my.data.melt[,list(all.fruit = sum(value)), by = variable]
variable all.fruit
1: county.a 510
2: county.b 422
3: county.c 334
4: county.d 246
Run Code Online (Sandbox Code Playgroud)
或者,如果你希望它留在宽格式
> DT <- as.data.table(my.data2)
> DT[, lapply(.SD, sum, na.rm=TRUE), .SDcols = grep("county",names(DT))])
county.a county.b county.c county.d
1: 510 422 334 246
# NB: This needs v1.8.3. Before that, an as.data.table() call was required as
# the lapply(.SD,...) used to return a named list in this no grouping case.
Run Code Online (Sandbox Code Playgroud)
> aggregate(value~variable, my.data.melt, sum)
variable value
1 county.a 510
2 county.b 422
3 county.c 334
4 county.d 246
Run Code Online (Sandbox Code Playgroud)
我只是对"county"列进行子集,求和,并使用结果创建data.frame:
out <- colSums(my.data2[,grepl("county",colnames(my.data2))])
out2 <- data.frame(counties=names(out), all.fruit=out,
row.names=NULL, stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2468 次 |
| 最近记录: |