Checking a different thread, I learnt about the function reshape from the stats package, I had no problem using it on a 'dummy' dataset, and managed to convert it from a long to a wide dataset. However, I don't know why it is not working on my data, it is pretty much the same object, the data types are similar. I'd appreciate you guys help me figure out the reason its behaving like it is.
Anyway, this gives no trouble:
> df <- data.frame(
+ year = c(rep(2000, 12), rep(2001, 12)),
+ month = rep(1:12, 2),
+ values = rnorm(24)
+ )
# year month values
1 2000 1 1.52435428
2 2000 2 -0.89394797
3 2000 3 0.75965499
4 2000 4 1.21497443
Run Code Online (Sandbox Code Playgroud)
Converted to wide:
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide")
# year values_1 values_2 values_3 values_4 values_5 values_6 values_7 values_8 values_9 values_10 values_11 values_12
1 2000 1.524354 -0.893948 0.759655 1.2149744 -1.3237634 -0.08681768 0.5208436 -0.2602807 0.6378904 -0.9852600 -1.128048 -0.1466028
2 2001 1.913969 -1.966720 -0.947688 0.8375891 -0.1015944 1.11812723 -1.5164472 -0.7089485 0.5975851 0.2514546 -1.578210 -0.9044418
Run Code Online (Sandbox Code Playgroud)
But when using my data, which looks like this:
my_df <- dput(head(experiment, 30))
structure(list(transcript = c("TR100743-c0_g1_i3", "TR100743-c0_g1_i3",
"TR100743-c0_g1_i3", "TR100743-c0_g1_i3", "TR100743-c0_g1_i3",
"TR100987-c0_g1_i2", "TR100987-c0_g1_i2", "TR100987-c0_g1_i2",
"TR100987-c0_g1_i2", "TR100987-c0_g1_i2", "TR101301-c4_g1_i16",
"TR101301-c4_g1_i16", "TR101301-c4_g1_i16", "TR101301-c4_g1_i16",
"TR101301-c4_g1_i16", "TR102190-c1_g1_i1", "TR102190-c1_g1_i1",
"TR102190-c1_g1_i1", "TR102190-c1_g1_i1", "TR102190-c1_g1_i1",
"TR102346-c0_g2_i1", "TR102346-c0_g2_i1", "TR102346-c0_g2_i1",
"TR102346-c0_g2_i1", "TR102346-c0_g2_i1", "TR102352-c4_g2_i5",
"TR102352-c4_g2_i5", "TR102352-c4_g2_i5", "TR102352-c4_g2_i5",
"TR102352-c4_g2_i5"), hours = c(0, 2, 8, 24, 48, 0, 2, 8, 24,
48, 0, 2, 8, 24, 48, 0, 2, 8, 24, 48, 0, 2, 8, 24, 48, 0, 2,
8, 24, 48), exp.change = c(NA, -43.1958273184645, -61.3014008509066,
964.925115099619, -52.7060728326392, NA, -46.2563848585369, 3.29396898799807,
-99.9994681489801, 106710484.025972, NA, -29.6341333478577, 522.224859380388,
40.4737694947169, -1.34388206141046, NA, -18.7670826937756, 5.49472822880452,
55.1072690537026, 33.5824607349752, NA, -99.999962131178, 789697313.24393,
18.6337471833012, 52.4442959208125, NA, -31.3334122297108, 9.64745757892995,
28.48552519881, 70.5808772231999), response = c("Primary", "Primary",
"Primary", "Primary", "Primary", "Primary", "Primary", "Primary",
"Primary", "Primary", "Primary", "Primary", "Primary", "Primary",
"Primary", "Tertiary", "Tertiary", "Tertiary", "Tertiary", "Tertiary",
"Primary", "Primary", "Primary", "Primary", "Primary", "Primary",
"Primary", "Primary", "Primary", "Primary")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -30L))
# transcript hours exp.change response
1 TR100743-c0_g1_i3 0 NA Primary
2 TR100743-c0_g1_i3 2 -43.2 Primary
3 TR100743-c0_g1_i3 8 -61.3 Primary
4 TR100743-c0_g1_i3 24 965. Primary
5 TR100743-c0_g1_i3 48 -52.7 Primary
6 TR100987-c0_g1_i2 0 NA Primary
7 TR100987-c0_g1_i2 2 -46.3 Primary
8 TR100987-c0_g1_i2 8 3.29 Primary
9 TR100987-c0_g1_i2 24 -100.0 Primary
10 TR100987-c0_g1_i2 48 106710484. Primary
Run Code Online (Sandbox Code Playgroud)
Gives this when I attempt to 'reshape' it:
my_df_wide <- reshape(my_df, idvar = c("transcript", "response"), timevar = "hours", v.names="exp.change", direction = "wide")
# transcript response `exp.change.c(0, 2, 8, 24, 48)`
1 TR100743-c0_g1_i3 Primary NA
2 TR100987-c0_g1_i2 Primary NA
3 TR101301-c4_g1_i16 Primary NA
4 TR102190-c1_g1_i1 Tertiary NA
5 TR102346-c0_g2_i1 Primary NA
6 TR102352-c4_g2_i5 Primary NA
7 TR10396-c0_g1_i6 Primary NA
8 TR11844-c0_g2_i1 Secondary NA
9 TR12672-c1_g2_i1 Primary NA
10 TR12672-c1_g2_i2 Primary NA
Run Code Online (Sandbox Code Playgroud)
Is it because of the NAs? I honestly don't know why it is behaving like that... any help is heavily appreciated.
使用 重塑数据stats::reshape可能很乏味。Hadley Wickham 和他的团队花了相当多的时间来创建一个全面的解决方案。首先出现的是reshape2包,然后tidyr不得不spread()和gather()那些现在换下辅之以pivot_wider()和pivot_longer()。
这就是您可以tidyr::pivot_wider()用来实现结果的方式,您似乎正在寻求。
library(tidyr)
pivot_wider(
my_df,
id_cols = c(transcript, response),
names_from = hours,
values_from = exp.change,
names_prefix = "exp.change_"
)
#> # A tibble: 6 x 7
#> transcript response exp.change_0 exp.change_2 exp.change_8 exp.change_24
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TR100743-… Primary NA -43.2 -61.3 965.
#> 2 TR100987-… Primary NA -46.3 3.29 -100.
#> 3 TR101301-… Primary NA -29.6 522. 40.5
#> 4 TR102190-… Tertiary NA -18.8 5.49 55.1
#> 5 TR102346-… Primary NA -100. 789697313. 18.6
#> 6 TR102352-… Primary NA -31.3 9.65 28.5
#> # … with 1 more variable: exp.change_48 <dbl>
Run Code Online (Sandbox Code Playgroud)
我认为tidyr,与stats::reshape().
编辑:
stats::reshape()给出了奇怪的结果,因为它似乎在处理 my_df 作为tibble. 除此之外,你的命令很好。只需添加一个as.data.frame(),你就可以开始了。
reshape(
as.data.frame(my_df),
idvar = c("transcript", "response"),
timevar = "hours",
v.names = "exp.change",
direction = "wide"
)
#> transcript response exp.change.0 exp.change.2 exp.change.8
#> 1 TR100743-c0_g1_i3 Primary NA -43.19583 -6.130140e+01
#> 6 TR100987-c0_g1_i2 Primary NA -46.25638 3.293969e+00
#> 11 TR101301-c4_g1_i16 Primary NA -29.63413 5.222249e+02
#> 16 TR102190-c1_g1_i1 Tertiary NA -18.76708 5.494728e+00
#> 21 TR102346-c0_g2_i1 Primary NA -99.99996 7.896973e+08
#> 26 TR102352-c4_g2_i5 Primary NA -31.33341 9.647458e+00
#> exp.change.24 exp.change.48
#> 1 964.92512 -5.270607e+01
#> 6 -99.99947 1.067105e+08
#> 11 40.47377 -1.343882e+00
#> 16 55.10727 3.358246e+01
#> 21 18.63375 5.244430e+01
#> 26 28.48553 7.058088e+01
Run Code Online (Sandbox Code Playgroud)
但是因为您似乎已经在使用 tidyversetidyr::pivot_wider()似乎是最合适的。
| 归档时间: |
|
| 查看次数: |
1435 次 |
| 最近记录: |