Rap*_*ter 5 excel transpose r xlsx dataframe
如何以保留类/数据类型信息的方式转置具有不同类的列向量(一列是character,下一列是,numeric另一列是logical等)的数据框?
示例数据:
mydata <- data.frame(
col0 = c("row1", "row2", "row3"),
col1 = c(1, 2, 3),
col2 = letters[1:3],
col3 = c(TRUE, FALSE, TRUE)
)
Run Code Online (Sandbox Code Playgroud)
这里还有一个xlsx包含两种数据方向示例的小文件:https : //github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx
t()像这篇文章中建议的那样一个简单或稍微复杂一点的例程很好,但不保留类信息或原始数据框的列。我也知道 classdata.frame从来没有打算在它的列中存储混合类信息。
但是,至少我想尽可能简单地“反转” data.frames 的本意:将视角集中在行而不是列视角。即,行向量中的所有元素都需要属于同一类,而列向量之间的类可以不同。
我经常在项目中工作,人们习惯于以水平(“变量在行”)而不是我们都习惯的垂直(“变量在列”)方向表示时间序列数据在R(并且,IMHO,也使得多更有意义)。
更重要的是,他们广泛使用 MS Excel。我需要同时读取数据“宽幅”我要更新现有的Excel文件的通过直接从R写入公式文件XLConnect和/或openxlsx(而不是能够做我的R中的计算,然后简单地倾倒在Excel 文件中的最终结果)。
虽然我一直试图告诉他们,使用这样的方向意味着违背跨语言/工具的既定标准(至少对于 R 和 MS Excel 是这样),但他们不太可能会转换。所以我必须以某种方式处理它。
所以我想保持一个底层,list但data.frame尽可能让它“看起来和感觉”像一个。它有效,但相当复杂。我认为可能有一些更聪明的解决方案。
功能定义:
transpose <- function (
x,
col = character(),
rnames_or_col = c("col", "rnames")
) {
rnames_or_col <- match.arg(rnames_or_col, c("col", "rnames"))
## Buffering column names //
cnames <- if (length(col)) {
x[[col]]
} else {
make.names(1:nrow(x))
}
## Removing anchoring column //
if (inherits(x, "data.table")) {
x <- as.data.frame(x, stringsAsFactors = FALSE)
}
## I don't like this part. Any suggestions on how a) build on top of existing
## data.table functionality b) the easiest way to make a data.table behave
## like a data.frame when indexing (remove operation below will yield
## "undesired" results from a data.frame perspective; it's fine in from
## data.table's perspective/paradigm of course)
if (length(col)) {
x <- x[ , -which(names(x) == col)]
}
## Buffer classes //
classes <- lapply(x, class)
## Buffer row names //
rnames <- names(x)
## Listify //
x <- lapply(as.list(x), function(row) {
df <- do.call(data.frame, list(as.list(row), stringsAsFactors = FALSE))
names(df) <- cnames
df
})
names(x) <- rnames
## Actual row names or row names as first column //
if (rnames_or_col == "col") {
x <- lapply(x, function(ii) {
data.frame(variable = row.names(ii), ii,
stringsAsFactors = FALSE, row.names = NULL, check.names = FALSE)
})
}
## Class //
class(x) <- c("df_transposed", class(x))
x
}
Run Code Online (Sandbox Code Playgroud)
打印方式:
print.df_transposed <- function(object) {
cat("df_transposed: \n")
out <- do.call(rbind, object)
rownames(out) <- NULL
print(out)
}
Run Code Online (Sandbox Code Playgroud)
getter 和 setter 方法:
"[<-.df_transposed" <- function(x, i, j, value) {
x[[i]][ , j] <- value
x
}
"[.df_transposed" <- function(x, i, j, drop = FALSE) {
# foo <- function(x, i, j, drop = FALSE) {
has_i <- !missing(i)
has_j <- !missing(j)
cls <- class(x)
scope <- if (has_i) {
i
} else {
1:length(x)
}
out <- lapply(unclass(x)[scope], function(ii) {
nms <- names(ii)
if (has_j) {
tmp <- ii[ , j, drop = drop]
names(tmp) <- nms[j]
## --> necessary due to `check.names` missing for `[.data.frame` :-/
tmp
} else {
ii
}
})
class(out) <- cls
out
}
Run Code Online (Sandbox Code Playgroud)
类功能:
class2 <- function(x) {
sapply(x, function(ii) {
value <- if ("variable" %in% names(ii)) {
unlist(ii[, -1])
} else {
unlist(ii)
}
class(value)
})
}
Run Code Online (Sandbox Code Playgroud)
示例数据:
mydata <- data.frame(
col0 = c("row1", "row2", "row3"),
col1 = c(1, 2, 3),
col2 = letters[1:3],
col3 = c(TRUE, FALSE, TRUE)
)
Run Code Online (Sandbox Code Playgroud)
实际移调和打印方法:
> (df_t <- transpose(mydata, col = "col0"))
df_transposed:
variable row1 row2 row3
1 col1 1 2 3
2 col2 a b c
3 col3 TRUE FALSE TRUE
> (df_t2 <- transpose(mydata, col = "col0", rnames_or_col = "rnames"))
df_transposed:
row1 row2 row3
col1 1 2 3
col2 a b c
col3 TRUE FALSE TRUE
Run Code Online (Sandbox Code Playgroud)
打印未分类的对象:
> unclass(df_t)
$col1
variable row1 row2 row3
1 col1 1 2 3
$col2
variable row1 row2 row3
1 col2 a b c
$col3
variable row1 row2 row3
1 col3 TRUE FALSE TRUE
> unclass(df_t2)
$col1
row1 row2 row3
col1 1 2 3
$col2
row1 row2 row3
col2 a b c
$col3
row1 row2 row3
col3 TRUE FALSE TRUE
Run Code Online (Sandbox Code Playgroud)
类查询:
> class2(df_t)
col1 col2 col3
"numeric" "character" "logical"
Run Code Online (Sandbox Code Playgroud)
索引:
> dat_t[1, ]
df_transposed:
variable row1 row2 row3
1 1 1 2 3
> dat_t[, 1]
df_transposed:
variable
1 1
2 1
3 1
>
> dat_t[1, 2]
df_transposed:
row1
col1 1
> dat_t[2, 3]
df_transposed:
row2
col2 b
>
> dat_t[1:2, ]
df_transposed:
variable row1 row2 row3
1 1 1 2 3
2 1 a b c
> dat_t[, 1:3]
df_transposed:
variable row1 row2
1 1 1 2
2 1 a b
3 1 TRUE FALSE
>
> dat_t[c(1, 3), 2:4]
df_transposed:
row1 row2 row3
col1 1 2 3
col3 1 0 1
Run Code Online (Sandbox Code Playgroud)