对于以下简单数据集;
row country year
1 NLD 2005
2 NLD 2005
3 BLG 2006
4 BLG 2005
5 GER 2005
6 NLD 2007
7 NLD 2005
8 NLD 2008
Run Code Online (Sandbox Code Playgroud)
下面的代码:
df[, .N, by = list(country, year)][,prop := N/sum(N)]
Run Code Online (Sandbox Code Playgroud)
给出观测值占观测值总数的比例。然而我想要的是衡量每个国家的比例。我应该如何调整这段代码才能给出正确的比例?
期望的输出:
row country year prop
1 NLD 2005 0.6
2 NLD 2005 0.6
3 BLG 2006 0.5
4 BLG 2005 0.5
5 GER 2005 1
6 NLD 2007 0.2
7 NLD 2005 0.6
8 NLD 2008 0.2
Run Code Online (Sandbox Code Playgroud)
使用data.table:
df <- read.table(header = T, text = "row country year
1 NLD 2005
2 NLD 2005
3 BLG 2006
4 BLG 2005
5 GER 2005
6 NLD 2007
7 NLD 2005
8 NLD 2008")
setDT(df)[, sum := .N, by = country][, prop := .N, by = c("country", "year")][, prop := prop/sum][, sum := NULL]
row country year prop
1: 1 NLD 2005 0.6
2: 2 NLD 2005 0.6
3: 3 BLG 2006 0.5
4: 4 BLG 2005 0.5
5: 5 GER 2005 1.0
6: 6 NLD 2007 0.2
7: 7 NLD 2005 0.6
8: 8 NLD 2008 0.2
Run Code Online (Sandbox Code Playgroud)