基于索引向量有效地将值分配给特定数据表行号

sun*_*nny 0 r data.table

我有一个按国家/地区列出数据的数据表,我想添加区域信息.我做了默认的"欧洲",所以我需要纠正那些不在"欧洲"地区的行.这是我需要分配的内容:

    ordered_tally[ , region :="Europe"]
    AsiaRows = c(5, 12, 17, 24, 25, 26, 34, 36, 45, 56, 61, 64, 75, 76, 81, 88, 90, 91)
    NorthAmericaRows = c(7, 18, 42, 47)
    SouthAmericaRows = c(11, 13, 32, 38, 41, 46, 48, 52, 53, 57, 58, 68, 77, 78, 83)
    AfricaRows = c(22, 66, 67, 70, 74, 82, 84, 86, 95)
    OceaniaRows = c(31, 62)
    MiddleEastRows = c(20, 51, 71, 72, 87, 89, 92)
    region = c("Asia", "NorthAmerica", "SouthAmerica", "Africa", "Oceania", "MiddleEast")
    region_indices = c(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows)
Run Code Online (Sandbox Code Playgroud)

这对我有用:

lapply(c(1:6), function(x) ordered_tally[region_indices[[x]], "region":= eval(region[x])])

有一个更好的方法吗?我的意义lapply不应该是必要的,必须有一个data.table我缺少的本地操作.

这是ordered_tally:

    V1              country  freq
 1:  1               Poland 27718
 2:  2                Spain  9800
 3:  3              Denmark  7551
 4:  4       United Kingdom  4910
 5:  5             Thailand  4069
 6:  6               Norway  3995
 7:  7        United States  3837
 8:  8          Netherlands  3075
 9:  9               France  2866
10: 10                Italy  2429
11: 11               Brazil  2109
12: 12                India  1804
13: 13            Argentina  1610
14: 14              Germany  1437
15: 15               Sweden  1409
16: 16            Lithuania  1339
17: 17               Taiwan  1241
18: 18               Mexico  1218
19: 19               Latvia  1152
20: 20               Israel  1076
21: 21       Czech Republic  1064
22: 22         South Africa  1036
23: 23             Portugal   897
24: 24             Malaysia   889
25: 25            Indonesia   851
26: 26            Hong Kong   722
27: 27               Russia   670
28: 28          Switzerland   635
29: 29              Finland   613
30: 30              Ireland   603
31: 31            Australia   588
32: 32                Chile   573
33: 33              Estonia   540
34: 34                China   514
35: 35              Belgium   498
36: 36            Singapore   488
37: 37              Austria   464
38: 38             Colombia   361
39: 39              Belarus   330
40: 40              Romania   330
41: 41            Guatemala   311
42: 42               Canada   295
43: 43             Bulgaria   288
44: 44               Greece   246
45: 45          South Korea   231
46: 46           Costa Rica   218
47: 47          Puerto Rico   218
48: 48            Venezuela   204
49: 49               Serbia   203
50: 50               Turkey   184
51: 51 United Arab Emirates   175
52: 52              Ecuador   172
53: 53                 Peru   161
54: 54              Hungary   136
55: 55           Montenegro   131
56: 56                Japan   127
57: 57          El Salvador    95
58: 58              Uruguay    84
59: 59               Cyprus    70
60: 60              Iceland    59
61: 61           Bangladesh    57
62: 62          New Zealand    47
63: 63              Moldova    42
64: 64          Philippines    41
65: 65             Slovakia    36
66: 66            Mauritius    32
67: 67              Nigeria    30
68: 68              Bolivia    25
69: 69              Croatia    25
70: 70             Zimbabwe    24
71: 71         Saudi Arabia    20
72: 72                 Oman    19
73: 73             Slovenia    19
74: 74                Egypt    15
75: 75              Vietnam    14
76: 76            Sri Lanka    10
77: 77   Dominican Republic     9
78: 78            Nicaragua     9
79: 79    Macedonia (FYROM)     5
80: 80              Ukraine     5
81: 81           Azerbaijan     4
82: 82              Namibia     4
83: 83             Barbados     3
84: 84             Botswana     3
85: 85                Malta     3
86: 86                Ghana     2
87: 87                 Iraq     2
88: 88           Kazakhstan     2
89: 89              Lebanon     2
90: 90                Macau     2
91: 91                Nepal     2
92: 92                Qatar     2
93: 93              Albania     1
94: 94        Faroe Islands     1
95: 95              Morocco     1
Run Code Online (Sandbox Code Playgroud)

这是dump重现的代码:

ordered_tally <-
data.table(V1 = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", 
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", 
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", 
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85", 
"86", "87", "88", "89", "90", "91", "92", "93", "94", "95"), 
    country = c("Poland", "Spain", "Denmark", "United Kingdom", 
    "Thailand", "Norway", "United States", "Netherlands", "France", 
    "Italy", "Brazil", "India", "Argentina", "Germany", "Sweden", 
    "Lithuania", "Taiwan", "Mexico", "Latvia", "Israel", "Czech Republic", 
    "South Africa", "Portugal", "Malaysia", "Indonesia", "Hong Kong", 
    "Russia", "Switzerland", "Finland", "Ireland", "Australia", 
    "Chile", "Estonia", "China", "Belgium", "Singapore", "Austria", 
    "Colombia", "Belarus", "Romania", "Guatemala", "Canada", 
    "Bulgaria", "Greece", "South Korea", "Costa Rica", "Puerto Rico", 
    "Venezuela", "Serbia", "Turkey", "United Arab Emirates", 
    "Ecuador", "Peru", "Hungary", "Montenegro", "Japan", "El Salvador", 
    "Uruguay", "Cyprus", "Iceland", "Bangladesh", "New Zealand", 
    "Moldova", "Philippines", "Slovakia", "Mauritius", "Nigeria", 
    "Bolivia", "Croatia", "Zimbabwe", "Saudi Arabia", "Oman", 
    "Slovenia", "Egypt", "Vietnam", "Sri Lanka", "Dominican Republic", 
    "Nicaragua", "Macedonia (FYROM)", "Ukraine", "Azerbaijan", 
    "Namibia", "Barbados", "Botswana", "Malta", "Ghana", "Iraq", 
    "Kazakhstan", "Lebanon", "Macau", "Nepal", "Qatar", "Albania", 
    "Faroe Islands", "Morocco"), freq = c(27718L, 9800L, 7551L, 
    4910L, 4069L, 3995L, 3837L, 3075L, 2866L, 2429L, 2109L, 1804L, 
    1610L, 1437L, 1409L, 1339L, 1241L, 1218L, 1152L, 1076L, 1064L, 
    1036L, 897L, 889L, 851L, 722L, 670L, 635L, 613L, 603L, 588L, 
    573L, 540L, 514L, 498L, 488L, 464L, 361L, 330L, 330L, 311L, 
    295L, 288L, 246L, 231L, 218L, 218L, 204L, 203L, 184L, 175L, 
    172L, 161L, 136L, 131L, 127L, 95L, 84L, 70L, 59L, 57L, 47L, 
    42L, 41L, 36L, 32L, 30L, 25L, 25L, 24L, 20L, 19L, 19L, 15L, 
    14L, 10L, 9L, 9L, 5L, 5L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L), user_count = c(4126L, 1533L, 
    1312L, 870L, 620L, 867L, 668L, 493L, 448L, 401L, 530L, 314L, 
    246L, 318L, 338L, 227L, 205L, 201L, 170L, 237L, 166L, 165L, 
    173L, 135L, 171L, 116L, 97L, 139L, 123L, 94L, 102L, 100L, 
    162L, 122L, 86L, 73L, 101L, 79L, 49L, 68L, 28L, 37L, 41L, 
    50L, 49L, 28L, 27L, 35L, 47L, 59L, 22L, 35L, 34L, 16L, 29L, 
    30L, 14L, 24L, 16L, 15L, 29L, 2L, 15L, 6L, 10L, 8L, 6L, 3L, 
    3L, 7L, 5L, 6L, 3L, 4L, 6L, 7L, 3L, 1L, 1L, 4L, 2L, 2L, 1L, 
    1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L))
Run Code Online (Sandbox Code Playgroud)

Fra*_*ank 5

我建议命名list而不是c:

region_indices = setNames(list(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows), region)
Run Code Online (Sandbox Code Playgroud)

然后将其变为data.frame

m = setNames(stack(region_indices), c("row", "region"))
Run Code Online (Sandbox Code Playgroud)

然后分配

ordered_tally[m$row, region := as.character(m$region)]
Run Code Online (Sandbox Code Playgroud)

在这一点上,是不包括在行region_indices拥有NAregion.如果您确定那些都是欧洲的,那么请使用

ordered_tally[is.na(region), region := "Europe"]
Run Code Online (Sandbox Code Playgroud)

评论.通常,通过行号存储数据是不安全的.(我的意思是,你可能会在路上给自己造成麻烦.)你可以通过国家名称获得地区名单.