我有一些格式很差的数据,我必须使用.它在前两行中包含两个标识符,后跟数据.数据看起来像:
V1 V2 V3
1 Date 12/16/18 12/17/18
2 Equip a b
3 x1 1 2
4 x2 3 4
5 x3 5 6
Run Code Online (Sandbox Code Playgroud)
我想让gather数据变得整洁,但只有当你有单个列名时,才能收集数据.我也尝试过传播.我提出的唯一解决方案是非常hacky并且感觉不对.有没有一种优雅的方式来处理这个?
这就是我想要的:
Date Equip metric value
1 12/16/18 a x1 1
2 12/16/18 a x2 3
3 12/16/18 a x3 5
4 12/17/18 b x1 2
5 12/17/18 b x2 4
6 12/17/18 b x3 6
Run Code Online (Sandbox Code Playgroud)
这种方法让我很接近,但我不知道如何处理糟糕的格式(没有标题,没有行名称).gather如果格式正确,应该很容易.
> as.data.frame(t(df))
V1 V2 V3 V4 V5
V1 Date Equip x1 x2 x3
V2 12/16/18 a 1 3 5
V3 12/17/18 b 2 4 6
Run Code Online (Sandbox Code Playgroud)
这是 dput
structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c("12/16/18",
"a", "1", "3", "5"), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c("V1",
"V2", "V3"), row.names = c(NA, -5L))
Run Code Online (Sandbox Code Playgroud)
感谢您发布一个可重现性很好的问题.这里有一些温柔tidyr/ dplyr按摩.
library(tidyverse)
df <- structure(
list(
V1 = c("Date", "Equip", "x1", "x2", "x3"),
V2 = c("12/16/18", "a", "1", "3", "5"),
V3 = c("12/17/18", "b", "2", "4", "6")
),
class = "data.frame",
.Names = c("V1", "V2", "V3"),
row.names = c(NA, -5L)
)
df %>%
gather(key = measure, value = value, -V1) %>%
spread(key = V1, value = value) %>%
select(-measure) %>%
gather(key = metric, value = value, x1:x3) %>%
arrange(Date, Equip, metric)
#> Date Equip metric value
#> 1 12/16/18 a x1 1
#> 2 12/16/18 a x2 3
#> 3 12/16/18 a x3 5
#> 4 12/17/18 b x1 2
#> 5 12/17/18 b x2 4
#> 6 12/17/18 b x3 6
Run Code Online (Sandbox Code Playgroud)
由reprex包(v0.2.0)创建于2018-04-20.