我有一个按国家/地区列出数据的数据表,我想添加区域信息.我做了默认的"欧洲",所以我需要纠正那些不在"欧洲"地区的行.这是我需要分配的内容:
ordered_tally[ , region :="Europe"]
AsiaRows = c(5, 12, 17, 24, 25, 26, 34, 36, 45, 56, 61, 64, 75, 76, 81, 88, 90, 91)
NorthAmericaRows = c(7, 18, 42, 47)
SouthAmericaRows = c(11, 13, 32, 38, 41, 46, 48, 52, 53, 57, 58, 68, 77, 78, 83)
AfricaRows = c(22, 66, 67, 70, 74, 82, 84, 86, 95)
OceaniaRows = c(31, 62)
MiddleEastRows = c(20, 51, 71, 72, 87, 89, 92)
region = c("Asia", "NorthAmerica", "SouthAmerica", "Africa", "Oceania", "MiddleEast")
region_indices = c(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows)
Run Code Online (Sandbox Code Playgroud)
这对我有用:
lapply(c(1:6), function(x) ordered_tally[region_indices[[x]], "region":= eval(region[x])])
有一个更好的方法吗?我的意义lapply不应该是必要的,必须有一个data.table我缺少的本地操作.
这是ordered_tally:
V1 country freq
1: 1 Poland 27718
2: 2 Spain 9800
3: 3 Denmark 7551
4: 4 United Kingdom 4910
5: 5 Thailand 4069
6: 6 Norway 3995
7: 7 United States 3837
8: 8 Netherlands 3075
9: 9 France 2866
10: 10 Italy 2429
11: 11 Brazil 2109
12: 12 India 1804
13: 13 Argentina 1610
14: 14 Germany 1437
15: 15 Sweden 1409
16: 16 Lithuania 1339
17: 17 Taiwan 1241
18: 18 Mexico 1218
19: 19 Latvia 1152
20: 20 Israel 1076
21: 21 Czech Republic 1064
22: 22 South Africa 1036
23: 23 Portugal 897
24: 24 Malaysia 889
25: 25 Indonesia 851
26: 26 Hong Kong 722
27: 27 Russia 670
28: 28 Switzerland 635
29: 29 Finland 613
30: 30 Ireland 603
31: 31 Australia 588
32: 32 Chile 573
33: 33 Estonia 540
34: 34 China 514
35: 35 Belgium 498
36: 36 Singapore 488
37: 37 Austria 464
38: 38 Colombia 361
39: 39 Belarus 330
40: 40 Romania 330
41: 41 Guatemala 311
42: 42 Canada 295
43: 43 Bulgaria 288
44: 44 Greece 246
45: 45 South Korea 231
46: 46 Costa Rica 218
47: 47 Puerto Rico 218
48: 48 Venezuela 204
49: 49 Serbia 203
50: 50 Turkey 184
51: 51 United Arab Emirates 175
52: 52 Ecuador 172
53: 53 Peru 161
54: 54 Hungary 136
55: 55 Montenegro 131
56: 56 Japan 127
57: 57 El Salvador 95
58: 58 Uruguay 84
59: 59 Cyprus 70
60: 60 Iceland 59
61: 61 Bangladesh 57
62: 62 New Zealand 47
63: 63 Moldova 42
64: 64 Philippines 41
65: 65 Slovakia 36
66: 66 Mauritius 32
67: 67 Nigeria 30
68: 68 Bolivia 25
69: 69 Croatia 25
70: 70 Zimbabwe 24
71: 71 Saudi Arabia 20
72: 72 Oman 19
73: 73 Slovenia 19
74: 74 Egypt 15
75: 75 Vietnam 14
76: 76 Sri Lanka 10
77: 77 Dominican Republic 9
78: 78 Nicaragua 9
79: 79 Macedonia (FYROM) 5
80: 80 Ukraine 5
81: 81 Azerbaijan 4
82: 82 Namibia 4
83: 83 Barbados 3
84: 84 Botswana 3
85: 85 Malta 3
86: 86 Ghana 2
87: 87 Iraq 2
88: 88 Kazakhstan 2
89: 89 Lebanon 2
90: 90 Macau 2
91: 91 Nepal 2
92: 92 Qatar 2
93: 93 Albania 1
94: 94 Faroe Islands 1
95: 95 Morocco 1
Run Code Online (Sandbox Code Playgroud)
这是dump重现的代码:
ordered_tally <-
data.table(V1 = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52",
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63",
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74",
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85",
"86", "87", "88", "89", "90", "91", "92", "93", "94", "95"),
country = c("Poland", "Spain", "Denmark", "United Kingdom",
"Thailand", "Norway", "United States", "Netherlands", "France",
"Italy", "Brazil", "India", "Argentina", "Germany", "Sweden",
"Lithuania", "Taiwan", "Mexico", "Latvia", "Israel", "Czech Republic",
"South Africa", "Portugal", "Malaysia", "Indonesia", "Hong Kong",
"Russia", "Switzerland", "Finland", "Ireland", "Australia",
"Chile", "Estonia", "China", "Belgium", "Singapore", "Austria",
"Colombia", "Belarus", "Romania", "Guatemala", "Canada",
"Bulgaria", "Greece", "South Korea", "Costa Rica", "Puerto Rico",
"Venezuela", "Serbia", "Turkey", "United Arab Emirates",
"Ecuador", "Peru", "Hungary", "Montenegro", "Japan", "El Salvador",
"Uruguay", "Cyprus", "Iceland", "Bangladesh", "New Zealand",
"Moldova", "Philippines", "Slovakia", "Mauritius", "Nigeria",
"Bolivia", "Croatia", "Zimbabwe", "Saudi Arabia", "Oman",
"Slovenia", "Egypt", "Vietnam", "Sri Lanka", "Dominican Republic",
"Nicaragua", "Macedonia (FYROM)", "Ukraine", "Azerbaijan",
"Namibia", "Barbados", "Botswana", "Malta", "Ghana", "Iraq",
"Kazakhstan", "Lebanon", "Macau", "Nepal", "Qatar", "Albania",
"Faroe Islands", "Morocco"), freq = c(27718L, 9800L, 7551L,
4910L, 4069L, 3995L, 3837L, 3075L, 2866L, 2429L, 2109L, 1804L,
1610L, 1437L, 1409L, 1339L, 1241L, 1218L, 1152L, 1076L, 1064L,
1036L, 897L, 889L, 851L, 722L, 670L, 635L, 613L, 603L, 588L,
573L, 540L, 514L, 498L, 488L, 464L, 361L, 330L, 330L, 311L,
295L, 288L, 246L, 231L, 218L, 218L, 204L, 203L, 184L, 175L,
172L, 161L, 136L, 131L, 127L, 95L, 84L, 70L, 59L, 57L, 47L,
42L, 41L, 36L, 32L, 30L, 25L, 25L, 24L, 20L, 19L, 19L, 15L,
14L, 10L, 9L, 9L, 5L, 5L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L), user_count = c(4126L, 1533L,
1312L, 870L, 620L, 867L, 668L, 493L, 448L, 401L, 530L, 314L,
246L, 318L, 338L, 227L, 205L, 201L, 170L, 237L, 166L, 165L,
173L, 135L, 171L, 116L, 97L, 139L, 123L, 94L, 102L, 100L,
162L, 122L, 86L, 73L, 101L, 79L, 49L, 68L, 28L, 37L, 41L,
50L, 49L, 28L, 27L, 35L, 47L, 59L, 22L, 35L, 34L, 16L, 29L,
30L, 14L, 24L, 16L, 15L, 29L, 2L, 15L, 6L, 10L, 8L, 6L, 3L,
3L, 7L, 5L, 6L, 3L, 4L, 6L, 7L, 3L, 1L, 1L, 4L, 2L, 2L, 1L,
1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L))
Run Code Online (Sandbox Code Playgroud)
我建议命名list而不是c:
region_indices = setNames(list(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows), region)
Run Code Online (Sandbox Code Playgroud)
然后将其变为data.frame
m = setNames(stack(region_indices), c("row", "region"))
Run Code Online (Sandbox Code Playgroud)
然后分配
ordered_tally[m$row, region := as.character(m$region)]
Run Code Online (Sandbox Code Playgroud)
在这一点上,是不包括在行region_indices拥有NA的region.如果您确定那些都是欧洲的,那么请使用
ordered_tally[is.na(region), region := "Europe"]
Run Code Online (Sandbox Code Playgroud)
评论.通常,通过行号存储数据是不安全的.(我的意思是,你可能会在路上给自己造成麻烦.)你可以通过国家名称获得地区名单.