下面是我原始数据框的一小部分.我需要那些在特定的行合并id是在特定的季节重复和lic和vessel是不同的.通过结合我需要总和qtty和grossTon.
请以id 431中的season 1998为例(*).
season lic id qtty vessel grossTon
…
1998 16350 431 40 435 57
1998 16353 431 28 303 22.54
…
Run Code Online (Sandbox Code Playgroud)
同一主题431有两种不同的lic(16350和16353)和两种不同的vessels(435和303).在这个特定情况下预期的结果是:
season lic id qtty vessel grossTon
…
1998 16350 431 68 435 79.54
…
Run Code Online (Sandbox Code Playgroud)
我不介意它lic和vessel生成的行中提醒,我要的是保持season,id并将得到的总和qtty和grossTon.在上面的示例中,我手动选择lic 16350和vessel 435.
说实话,我不知道该怎么做,所以我非常感谢任何帮助.
谢谢
原始数据(*=要组合的行)
season lic id qtty vessel grossTon
1998 15593 411 40 2643 31.5
1999 27271 411 40 2643 31.5
2000 35758 411 40 2643 31.5
2001 45047 411 50 2643 31.5
2002 56291 411 55 2643 31.5
2003 66991 411 55 2643 31.5
2004 80581 411 55 2643 31.5
2005 95058 411 52 NA NA
2006 113379 411 50 10911 4.65
2007 120894 411 50 10911 4.65
2008 130033 411 50 2483 8.5
2009 139201 411 46 2296 50
2010 148833 411 46 2296 50
2011 158395 411 46 2296 50
1998 16350 431 40 435 57 # *
1998 16353 431 28 303 22.54 # *
2000 37491 436 50 2021 19.11
2001 47019 436 50 2021 19.11
2002 57588 436 51 2021 19.11
2003 69128 436 51 2021 19.11
2004 82400 436 52 2021 19.11
2005 95599 436 50 2021 19.11
2006 113126 436 50 2021 19.11
2007 122387 436 50 2021 19.11
2008 131126 436 50 2021 19.11
2009 140417 436 50 2021 19.11
2010 150673 436 50 2021 19.11
2011 159776 436 50 2021 19.11
Run Code Online (Sandbox Code Playgroud)
另外,我需要保留前一行和后续行,id每季只有一行.像这样:(*=组合后产生的行)
season lic id qtty vessel grossTon
1998 15593 411 40 2643 31.5
1999 27271 411 40 2643 31.5
2000 35758 411 40 2643 31.5
2001 45047 411 50 2643 31.5
2002 56291 411 55 2643 31.5
2003 66991 411 55 2643 31.5
2004 80581 411 55 2643 31.5
2005 95058 411 52 NA NA
2006 113379 411 50 10911 4.65
2007 120894 411 50 10911 4.65
2008 130033 411 50 2483 8.5
2009 139201 411 46 2296 50
2010 148833 411 46 2296 50
2011 158395 411 46 2296 50
1998 16350 431 68 435 79.54 #*
2000 37491 436 50 2021 19.11
2001 47019 436 50 2021 19.11
2002 57588 436 51 2021 19.11
2003 69128 436 51 2021 19.11
2004 82400 436 52 2021 19.11
2005 95599 436 50 2021 19.11
2006 113126 436 50 2021 19.11
2007 122387 436 50 2021 19.11
2008 131126 436 50 2021 19.11
2009 140417 436 50 2021 19.11
2010 150673 436 50 2021 19.11
2011 159776 436 50 2021 19.11
Run Code Online (Sandbox Code Playgroud)
Ric*_*rta 12
如果将data.frame转换为data.table,则可以充分利用该by参数
library(data.table)
DT <- data.table(DF) # DF is your original data
Run Code Online (Sandbox Code Playgroud)
那么它只是一行:
DT[, lapply(.SD, sum), by=list(season, lic, id, vessel)]
Run Code Online (Sandbox Code Playgroud)
我们可以过滤掉1998 Season,如果我们愿意的话:'
DT[, lapply(.SD, sum), by=list(season, lic, id, vessel)][season==1998]
season lic id vessel qtty grossTon
1: 1998 15593 411 2643 40 31.50
2: 1998 16350 431 435 68 114.00
3: 1998 16353 431 303 68 45.08
Run Code Online (Sandbox Code Playgroud)
整个结果输出如下所示:
season lic id vessel qtty grossTon
1: 1998 15593 411 2643 40 31.50
2: 1999 27271 411 2643 40 31.50
3: 2000 35758 411 2643 40 31.50
4: 2001 45047 411 2643 50 31.50
5: 2002 56291 411 2643 55 31.50
6: 2003 66991 411 2643 55 31.50
7: 2004 80581 411 2643 55 31.50
8: 2005 95058 411 NA 52 NA
9: 2006 113379 411 10911 50 4.65
10: 2007 120894 411 10911 50 4.65
11: 2008 130033 411 2483 50 8.50
12: 2009 139201 411 2296 46 50.00
13: 2010 148833 411 2296 46 50.00
14: 2011 158395 411 2296 46 50.00
15: 1998 16350 431 435 68 114.00
16: 1998 16353 431 303 68 45.08
17: 1999 28641 431 303 68 45.08
18: 1999 28644 431 435 68 114.00
19: 2000 37491 436 2021 50 19.11
20: 2001 47019 436 2021 50 19.11
21: 2002 57588 436 2021 51 19.11
22: 2003 69128 436 2021 51 19.11
23: 2004 82400 436 2021 52 19.11
24: 2005 95599 436 2021 50 19.11
25: 2006 113126 436 2021 50 19.11
26: 2007 122387 436 2021 50 19.11
27: 2008 131126 436 2021 50 19.11
28: 2009 140417 436 2021 50 19.11
29: 2010 150673 436 2021 50 19.11
30: 2011 159776 436 2021 50 19.11
season lic id vessel qtty grossTon
Run Code Online (Sandbox Code Playgroud)
以下是Frank建议使用的单行基本解决方案aggregate:
Df_agg <- aggregate(. ~ season + lic + id + vessel, data = DF, sum)
# DF is your data
# we use season + lic + id + vessel as the grouping elements
Run Code Online (Sandbox Code Playgroud)
检查输出:
Df_agg[with(Df_agg, order(lic)), ]
# check the output (sort for convenience), identical to Ricardo Saporta's output
season lic id vessel qtty grossTon
21 1998 15593 411 2643 40 31.50
3 1998 16350 431 435 68 114.00
1 1998 16353 431 303 68 45.08
22 1999 27271 411 2643 40 31.50
2 1999 28641 431 303 68 45.08
4 1999 28644 431 435 68 114.00
23 2000 35758 411 2643 40 31.50
5 2000 37491 436 2021 50 19.11
24 2001 45047 411 2643 50 31.50
6 2001 47019 436 2021 50 19.11
25 2002 56291 411 2643 55 31.50
7 2002 57588 436 2021 51 19.11
26 2003 66991 411 2643 55 31.50
8 2003 69128 436 2021 51 19.11
27 2004 80581 411 2643 55 31.50
9 2004 82400 436 2021 52 19.11
10 2005 95599 436 2021 50 19.11
11 2006 113126 436 2021 50 19.11
28 2006 113379 411 10911 50 4.65
29 2007 120894 411 10911 50 4.65
12 2007 122387 436 2021 50 19.11
20 2008 130033 411 2483 50 8.50
13 2008 131126 436 2021 50 19.11
17 2009 139201 411 2296 46 50.00
14 2009 140417 436 2021 50 19.11
18 2010 148833 411 2296 46 50.00
15 2010 150673 436 2021 50 19.11
19 2011 158395 411 2296 46 50.00
16 2011 159776 436 2021 50 19.11
Run Code Online (Sandbox Code Playgroud)
检查1998年,与RS一样,似乎OP的期望输出中有错误,57 + 57!= 79.54但= 114
Df_agg[Df_agg$season == 1998,]
season lic id vessel qtty grossTon
21 1998 15593 411 2643 40 31.50
3 1998 16350 431 435 68 114.00
1 1998 16353 431 303 68 45.08
Run Code Online (Sandbox Code Playgroud)