下面的 csv 来自一个更长的数据表,称之为temp. 我想将它转换temp.wide为region_codeas 列和region_code(SAS, SSA, EUR, ...)的垂直顺序作为列的顺序。我只是注意到 dcast 按字母顺序排列新列。
scenario region_code region_name value
1: 2010 SAS South Asia 61.17716
2: 2010 SSA Africa south of the Sahara 62.08588
3: 2010 EUR Europe 63.76123
4: 2010 LAC Latin America and Caribbean 68.84806
5: 2010 FSU Former Soviet Union 59.04499
6: 2010 EAP East Asia and Pacific 64.00579
7: 2010 NAM North America 66.18235
8: 2010 MEN Middle East and North Africa 58.03167
9: SSP2-NoCC-REF SAS South Asia 57.29973
10: SSP2-NoCC-REF SSA Africa south of the Sahara 65.14987
11: SSP2-NoCC-REF EUR Europe 63.99204
12: SSP2-NoCC-REF LAC Latin America and Caribbean 68.21118
13: SSP2-NoCC-REF FSU Former Soviet Union 60.10807
14: SSP2-NoCC-REF EAP East Asia and Pacific 63.86103
15: SSP2-NoCC-REF NAM North America 65.97859
16: SSP2-NoCC-REF MEN Middle East and North Africa 58.98356
temp = setDT(structure(list(scenario = c("2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2010", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF",
"SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF",
"SSP2-NoCC-REF"), region_code = c("SAS", "SSA", "EUR", "LAC",
"FSU", "EAP", "NAM", "MEN", "SAS", "SSA", "EUR", "LAC", "FSU",
"EAP", "NAM", "MEN"), region_name = c("South Asia", "Africa south of the Sahara",
"Europe", "Latin America and Caribbean", "Former Soviet Union",
"East Asia and Pacific", "North America", "Middle East and North Africa",
"South Asia", "Africa south of the Sahara", "Europe", "Latin America and Caribbean",
"Former Soviet Union", "East Asia and Pacific", "North America",
"Middle East and North Africa"), value = c(61.1771623260257,
62.0858809906661, 63.7612306428217, 68.84805628195, 59.0449875464304,
64.0057851485101, 66.182351351389, 58.0316719859857, 57.299725759211,
65.1498720847705, 63.9920412193261, 68.2111842947542, 60.1080745513644,
63.86103368494, 65.9785850777114, 58.9835574681585)), .Names = c("scenario",
"region_code", "region_name", "value"), row.names = c(NA, -16L
), class = "data.frame"))
Run Code Online (Sandbox Code Playgroud)
这是我使用的代码。
formula.wide <- "scenario ~ region_code"
temp.wide <- data.table::dcast(
data = temp,
formula = formula.wide,
value.var = "value")
scenario EAP EUR FSU LAC MEN NAM SAS SSA
1: 2010 64.00579 63.76123 59.04499 68.84806 58.03167 66.18235 61.17716 62.08588
2: SSP2-NoCC-REF 63.86103 63.99204 60.10807 68.21118 58.98356 65.97859 57.29973 65.14987
Run Code Online (Sandbox Code Playgroud)
新的列名是scenario, EAP, EUR, FSU, LAC, MEN, NAM, SAS, SSA.
我可以从中获取正确的顺序temp,然后用于setcolorder给出temp.wide正确的列顺序。但我想知道是否有某种方法可以不按字母顺序排列新的列顺序。
此外,dcast 的帮助文本说
正在转换的列的名称按照与公式 RHS 中提到的每列中的(唯一)值相同的顺序(由下划线 _ 分隔)生成。
如果我正确理解这一点,我认为它没有描述 dcast 实际做什么。但我不明白括号短语(由下划线分隔,_)是什么意思。
以 region_code (SAS, SSA, EUR, ...) 的垂直顺序作为列的顺序
只需传递一个具有适当级别的因子:
dcast(temp, scenario ~ factor(region_code, levels=unique(region_code)))
scenario SAS SSA EUR LAC FSU EAP NAM MEN
1: 2010 61.17716 62.08588 63.76123 68.84806 59.04499 64.00579 66.18235 58.03167
2: SSP2-NoCC-REF 57.29973 65.14987 63.99204 68.21118 60.10807 63.86103 65.97859 58.98356
Run Code Online (Sandbox Code Playgroud)
OP 中引用的文档对我来说是正确的;in z ~ x + y-- x 的唯一值按结果列名称的顺序排在 y 的唯一值之前。