r 数据表 dcast 新列的顺序

Jer*_*ryN 1 r data.table

下面的 csv 来自一个更长的数据表,称之为temp. 我想将它转换temp.wideregion_codeas 列region_code(SAS, SSA, EUR, ...)的垂直顺序作为列的顺序。我只是注意到 dcast 按字母顺序排列新列。

         scenario region_code                  region_name    value
 1:          2010         SAS                   South Asia 61.17716
 2:          2010         SSA   Africa south of the Sahara 62.08588
 3:          2010         EUR                       Europe 63.76123
 4:          2010         LAC  Latin America and Caribbean 68.84806
 5:          2010         FSU          Former Soviet Union 59.04499
 6:          2010         EAP        East Asia and Pacific 64.00579
 7:          2010         NAM                North America 66.18235
 8:          2010         MEN Middle East and North Africa 58.03167
 9: SSP2-NoCC-REF         SAS                   South Asia 57.29973
10: SSP2-NoCC-REF         SSA   Africa south of the Sahara 65.14987
11: SSP2-NoCC-REF         EUR                       Europe 63.99204
12: SSP2-NoCC-REF         LAC  Latin America and Caribbean 68.21118
13: SSP2-NoCC-REF         FSU          Former Soviet Union 60.10807
14: SSP2-NoCC-REF         EAP        East Asia and Pacific 63.86103
15: SSP2-NoCC-REF         NAM                North America 65.97859
16: SSP2-NoCC-REF         MEN Middle East and North Africa 58.98356

temp = setDT(structure(list(scenario = c("2010", "2010", "2010", "2010", "2010", 
"2010", "2010", "2010", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", 
"SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", 
"SSP2-NoCC-REF"), region_code = c("SAS", "SSA", "EUR", "LAC", 
"FSU", "EAP", "NAM", "MEN", "SAS", "SSA", "EUR", "LAC", "FSU", 
"EAP", "NAM", "MEN"), region_name = c("South Asia", "Africa south of the Sahara", 
"Europe", "Latin America and Caribbean", "Former Soviet Union", 
"East Asia and Pacific", "North America", "Middle East and North Africa", 
"South Asia", "Africa south of the Sahara", "Europe", "Latin America and Caribbean", 
"Former Soviet Union", "East Asia and Pacific", "North America", 
"Middle East and North Africa"), value = c(61.1771623260257, 
62.0858809906661, 63.7612306428217, 68.84805628195, 59.0449875464304, 
64.0057851485101, 66.182351351389, 58.0316719859857, 57.299725759211, 
65.1498720847705, 63.9920412193261, 68.2111842947542, 60.1080745513644, 
63.86103368494, 65.9785850777114, 58.9835574681585)), .Names = c("scenario", 
"region_code", "region_name", "value"), row.names = c(NA, -16L
), class = "data.frame"))
Run Code Online (Sandbox Code Playgroud)

这是我使用的代码。

formula.wide <- "scenario ~ region_code"
  temp.wide <- data.table::dcast(
    data = temp,
    formula = formula.wide,
    value.var = "value")


        scenario      EAP      EUR      FSU      LAC      MEN      NAM      SAS      SSA
1:          2010 64.00579 63.76123 59.04499 68.84806 58.03167 66.18235 61.17716 62.08588
2: SSP2-NoCC-REF 63.86103 63.99204 60.10807 68.21118 58.98356 65.97859 57.29973 65.14987
Run Code Online (Sandbox Code Playgroud)

新的列名是scenario, EAP, EUR, FSU, LAC, MEN, NAM, SAS, SSA.

我可以从中获取正确的顺序temp,然后用于setcolorder给出temp.wide正确的列顺序。但我想知道是否有某种方法可以不按字母顺序排列新的列顺序。

此外,dcast 的帮助文本说

正在转换的列的名称按照与公式 RHS 中提到的每列中的(唯一)值相同的顺序(由下划线 _ 分隔)生成。

如果我正确理解这一点,我认为它没有描述 dcast 实际做什么。但我不明白括号短语(由下划线分隔,_)是什么意思。

Fra*_*ank 5

以 region_code (SAS, SSA, EUR, ...) 的垂直顺序作为列的顺序

只需传递一个具有适当级别的因子:

dcast(temp, scenario ~ factor(region_code, levels=unique(region_code)))

        scenario      SAS      SSA      EUR      LAC      FSU      EAP      NAM      MEN
1:          2010 61.17716 62.08588 63.76123 68.84806 59.04499 64.00579 66.18235 58.03167
2: SSP2-NoCC-REF 57.29973 65.14987 63.99204 68.21118 60.10807 63.86103 65.97859 58.98356
Run Code Online (Sandbox Code Playgroud)

OP 中引用的文档对我来说是正确的;in z ~ x + y-- x 的唯一值按结果列名称的顺序排在 y 的唯一值之前。

  • 或`dcast(temp, 场景~region_code)[, c(names(temp)[1], unique(temp$region_code)), with = FALSE]` (2认同)