使用列中的固定效果信息重塑R中的数据

Jua*_*uan 6 stack r reshape

我已经在excel中为我提供了一些非常笨拙的格式化数据,我需要对其进行重塑,使其适合在R中运行生存分析。

我将数据摘录上载到Google云端硬盘:https : //drive.google.com/open?id=1ret3bCDCYPDALQ16YBloaeopfl2-qVbp
原始数据框包含大约2100个观测值和950个变量

这是基本数据帧:

my.data<-data.frame(
  ID=c( "", "","C8477","C5273","C5566"),
  LR=c("2012Y","State:FL",5,6,8),
  LR=c("2012Y","State:AZ",5,8,10),
  LR=c("2011Y","State:FL",7,2,1)
)

my.data

#     ID       LR     LR.1     LR.2
# 1          2012Y    2012Y    2011Y
# 2       State:FL State:AZ State:FL
# 3 C8477        5        5        7
# 4 C5273        6        8        2
# 5 C5566        8       10        1
Run Code Online (Sandbox Code Playgroud)

所有列均具有相同的名称“ LR”。我不知道以后是否会出现问题...

在第1行中给出了Year,在第2行中给出了观察的相应状态。

作为输出,我需要一些面板数据供以后的生存分析使用。

my.data<-data.frame(
  ID=c("C8477","C5273","C5566"),
  Year=c("2012","2012","2011"), 
  State=c("FL","AZ","FL"),LR=c(5,8,1)
) 

my.data

#     ID Year State LR
# 1 C8477 2012    FL  5
# 2 C5273 2012    AZ  8
# 3 C5566 2011    FL  1
Run Code Online (Sandbox Code Playgroud)

我使用了reshape函数和seq函数,但是这些都不能帮助我朝正确的方向移动,因为数据帧是如此奇怪地排列。

Zhi*_*ang 2

这是一种tidyverse方法:

my.data <- data.frame(
  ID=c( "", "","C8477","C5273","C5566"),
  LR=c("2012Y","State:FL",5,6,8),
  LR=c("2012Y","State:AZ",5,8,10),
  LR=c("2011Y","State:FL",7,2,1)
)
Run Code Online (Sandbox Code Playgroud)

我的代码:

library(tidyverse)
year <- as.matrix(my.data[1, -1])
year <- str_split(year, "Y", simplify = T)[,1]
state <-as.matrix(my.data[2, -1])
both<-paste(state, year, sep = "_")
mydata1<-my.data[-c(1, 2), ]
colnames(mydata1) <-c("ID", both)
long <-pivot_longer(mydata1, 
             cols = starts_with("state"),
             names_to = "State_year",
             values_to = "LR")
long %>%
  transmute(
    ID, LR, 
    state = str_split(State_year, "_", simplify = T)[, 1],
    state = str_split(state, ":", simplify = T)[, 2], 
    year = str_split(State_year, "_", simplify = T)[, 2]
)

Run Code Online (Sandbox Code Playgroud)

我们得到:

  ID    LR    state year 
1 C8477 5     FL    2012 
2 C8477 5     AZ    2012 
3 C8477 7     FL    2011 
4 C5273 6     FL    2012 
5 C5273 8     AZ    2012 
6 C5273 2     FL    2011 
7 C5566 8     FL    2012 
8 C5566 10    AZ    2012 
9 C5566 1     FL    2011  
Run Code Online (Sandbox Code Playgroud)