Max*_*x M 5 r melt data.table tidyr tidyverse
这确实是这个问题的重复 r-split-string-using-tidyrseparate,但我不能将 MWE 用于我的目的,因为我不知道如何调整正则表达式。我基本上想要同样的东西,但在最后一个下划线之后分割变量。
原因:我的数据中某些列针对相同因素/类型多次出现。我想我可以将数据在类型字符串之前将值变量分开,然后将其再次展开为具有较少列的宽格式。我的问题是我的变量名有不同的几个下划线,我想学习如何在我事先添加的最后一个下划线之后分隔。
微量元素
library(tidyr)
library(data.table)
dt<-data.table(Name=c("A","B","C"),Var_1_EVU=c(2,NA,NA),Var_1_BdS=c(NA,3,4),Var_2_BdS=c(NA,3,4))
dt.long<-melt(dt, id.vars=c("Name"))
dt.long<-separate(dt.long,variable, c("test","type"), sep='/[^_]*$/')
dt.wide<-spread(dt.long,key=Name,value=value)
Run Code Online (Sandbox Code Playgroud)
我想要类似的东西
Name type Var1 Var2
1: A BdS NA NA
2: A EVU 2 NA
3: B BdS 3 3
4: B EVU NA NA
5: C BdS 4 4
6: C EVU NA NA
Run Code Online (Sandbox Code Playgroud)
library(tidyr)
df <- data.frame(Name = c("A","B","C"),
Var_1_EVU = c(2,NA,NA),
Var_1_BdS = c(NA,3,4),
Var_2_BdS = c(NA,3,4))
df %>%
gather("type", "value", -Name) %>%
separate(type, into = c("type", "type_num", "var")) %>%
unite(type, type, type_num, sep = "") %>%
spread(type, value)
# Name var Var1 Var2
# 1 A BdS NA NA
# 2 A EVU 2 NA
# 3 B BdS 3 3
# 4 B EVU NA NA
# 5 C BdS 4 4
# 6 C EVU NA NA
Run Code Online (Sandbox Code Playgroud)
tidyr::extract用于处理具有任意数量下划线的变量名的示例...
library(dplyr)
library(tidyr)
df <- data.frame(Name = c("A","B","C"),
Var_x_1_EVU = c(2,NA,NA),
Var_x_1_BdS = c(NA,3,4),
Var_x_y_2_BdS = c(NA,3,4))
df %>%
gather("col_name", "value", -Name) %>%
extract(col_name, c("var", "type"), "(.*)_(.*)") %>%
spread(var, value)
# Name type Var_x_1 Var_x_y_2
# 1 A BdS NA NA
# 2 A EVU 2 NA
# 3 B BdS 3 3
# 4 B EVU NA NA
# 5 C BdS 4 4
# 6 C EVU NA NA
Run Code Online (Sandbox Code Playgroud)
mutate(n = row_number())您可以通过首先添加行号列/变量来使每个观察唯一,从而避免重复观察的潜在问题,并且您可以通过使用显式调用它tidyr::extract来避免被屏蔽...magrittrtidyr::extract
library(dplyr)
library(tidyr)
library(data.table)
library(magrittr)
dt <- data.table(Name = c("A", "A", "B", "C"),
Var_1_EVU = c(1, 2, NA, NA),
Var_1_BdS = c(1, NA, 3, 4),
Var_x_2_BdS = c(1, NA, 3, 4))
dt %>%
mutate(n = row_number()) %>%
gather("col_name", "value", -n, -Name) %>%
tidyr::extract(col_name, c("var", "type"), "(.*)_(.*)") %>%
spread(var, value)
# Name n type Var_1 Var_x_2
# 1 A 1 BdS 1 1
# 2 A 1 EVU 1 NA
# 3 A 2 BdS NA NA
# 4 A 2 EVU 2 NA
# 5 B 3 BdS 3 3
# 6 B 3 EVU NA NA
# 7 C 4 BdS 4 4
# 8 C 4 EVU NA NA
Run Code Online (Sandbox Code Playgroud)