我使用patterns()参数data.table::melt()来融合具有几个容易定义的模式的列的数据.它正在工作,但我没有看到我如何创建一个字符索引变量而不是默认的数字细分.
例如,在A中,dog和cat列已编号...请查看"变量"列:
A = data.table(idcol = c(1:5),
dog_1 = c(1:5), cat_1 = c(101:105),
dog_2 = c(6:10), cat_2 = c(106:110),
dog_3 = c(11:15), cat_3 = c(111:115))
head(melt(A, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat")))
idcol variable dog cat
1: 1 1 1 101
2: 2 1 2 102
3: 3 1 3 103
4: 4 1 4 104
5: 5 1 5 105
6: 1 2 6 106
Run Code Online (Sandbox Code Playgroud)
但是,在B中,dog和cat列用文本编号,但"变量"列仍然是数字.
B = data.table(idcol = c(1:5),
dog_one = c(1:5), cat_one = c(101:105),
dog_two = c(6:10), cat_two = c(106:110),
dog_three = c(11:15), cat_three = c(111:115))
head(melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat")))
idcol variable dog cat
1: 1 1 1 101
2: 2 1 2 102
3: 3 1 3 103
4: 4 1 4 104
5: 5 1 5 105
6: 1 2 6 106
Run Code Online (Sandbox Code Playgroud)
如何用一个/两个/三个而不是1/2/3填充"变量"列?
Hen*_*rik 11
可能有更简单的方法,但这似乎有效:
# grab suffixes of 'variable' names
suff <- unique(sub('^.*_', '', names(B[ , -1])))
# suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]])
# melt
B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))
# replace factor levels in 'variable' with the suffixes
setattr(B2$variable, "levels", suff)
B2
# idcol variable dog cat
# 1: 1 one 1 101
# 2: 2 one 2 102
# 3: 3 one 3 103
# 4: 4 one 4 104
# 5: 5 one 5 105
# 6: 1 two 6 106
# 7: 2 two 7 107
# 8: 3 two 8 108
# 9: 4 two 9 109
# 10: 5 two 10 110
# 11: 1 three 11 111
# 12: 2 three 12 112
# 13: 3 three 13 113
# 14: 4 three 14 114
# 15: 5 three 15 115
Run Code Online (Sandbox Code Playgroud)
请注意,在此主题上存在一个未解决的问题,其他一些替代方案:FR:扩展用于处理输出名称的熔解功能.
这是我认为good'ol data.table更清洁的(罕见)情况之一.它的variable参数在这里派上用场了 - 'value'列的名称和'variable'列的级别都是一次性生成的:
reshape(data = B,
varying = names(B[ , -1]),
sep = "_",
direction = "long")
Run Code Online (Sandbox Code Playgroud)