重塑R中的数据帧

Vin*_*nce 16 r reshape dataframe

我遇到了重塑大型数据帧的困难.我过去一直很幸运地避免重塑问题,这也意味着我很糟糕.

我当前的数据框看起来像这样:

unique_id    seq   response    detailed.name    treatment 
a            N1     123.23     descr. of N1     T1
a            N2     231.12     descr. of N2     T1
a            N3     231.23     descr. of N3     T1
...
b            N1     343.23     descr. of N1     T2
b            N2     281.13     descr. of N2     T2
b            N3     901.23     descr. of N3     T2
...
Run Code Online (Sandbox Code Playgroud)

而且我想:

seq    detailed.name   T1           T2
N1     descr. of N1    123.23       343.23
N2     descr. of N2    231.12       281.13
N3     descr. of N3    231.23       901.23
Run Code Online (Sandbox Code Playgroud)

我已经研究过reshape包,但我不确定如何将处理因子转换为单独的列名.

谢谢!

编辑:我尝试在我的本地机器(4GB双核iMac 3.06Ghz)上运行它并且它一直失败:

> d.tmp.2 <- cast(d.tmp, `SEQ_ID` + `GENE_INFO` ~ treatments)
Aggregation requires fun.aggregate: length used as default
R(5751) malloc: *** mmap(size=647168) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Run Code Online (Sandbox Code Playgroud)

当我有机会的时候,我会尝试在我们更大的机器上运行它.

Har*_*lan 20

重塑对我来说似乎总是很棘手,但似乎总是有点试错.这是我最终找到的:

> x
  unique_id seq response detailed.name treatment
1         a  N1   123.23           dN1        T1
2         a  N2   231.12           dN2        T1
3         a  N3   231.23           dN3        T1
4         b  N1   343.23           dN1        T2
5         b  N2   281.13           dN2        T2
6         b  N3   901.23           dN3        T2

> x2 <- melt(x, c("seq", "detailed.name", "treatment"), "response")
> x2
  seq detailed.name treatment variable  value
1  N1           dN1        T1 response 123.23
2  N2           dN2        T1 response 231.12
3  N3           dN3        T1 response 231.23
4  N1           dN1        T2 response 343.23
5  N2           dN2        T2 response 281.13
6  N3           dN3        T2 response 901.23

> cast(x2, seq + detailed.name ~ treatment)
  seq detailed.name     T1     T2
1  N1           dN1 123.23 343.23
2  N2           dN2 231.12 281.13
3  N3           dN3 231.23 901.23
Run Code Online (Sandbox Code Playgroud)

您的原始数据已经是长格式,但不是融合/强制转换使用的长格式.所以我重新融化了.第二个参数(id.vars)是不易融化的事物列表.第三个参数(measure.vars)是变化的事物列表.

然后,演员使用公式.波浪号的左侧是保持原样的东西,波浪号的右侧是用于调节值列的列.

或多或少...!


lea*_*rnr 6

基于Harlan的答案 - 如果数据已经是长格式,并且在cast调用中指定了列保持值,则可以避免重熔步骤.

> x <- read.table(textConnection("  unique_id seq response detailed.name treatment
+ 1         a  N1   123.23           dN1        T1
+ 2         a  N2   231.12           dN2        T1
+ 3         a  N3   231.23           dN3        T1
+ 4         b  N1   343.23           dN1        T2
+ 5         b  N2   281.13           dN2        T2
+ 6         b  N3   901.23           dN3        T2"))
> 
> cast(x, seq + detailed.name ~ treatment, value = "response")
  seq detailed.name     T1     T2
1  N1           dN1 123.23 343.23
2  N2           dN2 231.12 281.13
3  N3           dN3 231.23 901.23
Run Code Online (Sandbox Code Playgroud)