我的df2:
League freq
18 England 108
27 Italy 79
20 Germany 74
43 Spain 64
19 France 49
39 Russia 34
31 Mexico 27
47 Turkey 24
32 Netherlands 23
37 Portugal 21
49 United States 18
29 Japan 16
25 Iran 15
7 Brazil 13
22 Greece 13
14 Costa 11
45 Switzerland 11
5 Belgium 10
17 Ecuador 10
23 Honduras 10
42 South Korea 9
2 Argentina 8
48 Ukraine 7
3 Australia 6
11 Chile 6
12 China 6
15 Croatia 6
35 Norway 6
41 Scotland 6
34 Nigeria 5
Run Code Online (Sandbox Code Playgroud)
我试着选择europe.
europe <- subset(df2, nrow(x=18, 27, 20) select=c(1, 2))
Run Code Online (Sandbox Code Playgroud)
什么是选择最有效的方式europe,africa,Asia...从df2?
您需要手动确定哪些国家/地区位于哪些国家/地区,或者您可以从某处获取此信息:
(使用XML包将html表刮到R数据帧的基本策略)
library(XML)
theurl <- "http://en.wikipedia.org/wiki/List_of_European_countries_by_area"
tables <- readHTMLTable(theurl)
library(stringr)
europe_names <- str_extract(as.character(tables[[1]]$Country),"[[:alpha:] ]+")
head(sort(europe_names))
## [1] "Albania" "Andorra" "Austria" "Azerbaijan" "Belarus"
## [6] "Belgium"
## there's also a 'Total' entry in here but it's probably harmless ...
subset(df2,League %in% europe_names)
Run Code Online (Sandbox Code Playgroud)
当然,你必须再次为亚洲,美国等来解决这个问题.