我正在努力解决一些非常基本的问题:根据时间格式对数据框进行排序(月份,或者,在这种情况下为"%B-%y").我的目标是计算各种月度统计数据,从总和开始.
数据框的相关部分的部分看起来像这样*(这很好,并且符合我的目标.我在此处包含它以显示问题可能源自何处)*:
> tmp09
Instrument AccountValue monthYear ExitTime
1 JPM 6997 april-07 2007-04-10
2 JPM 7261 mei-07 2007-05-29
3 JPM 7545 juli-07 2007-07-18
4 JPM 7614 juli-07 2007-07-19
5 JPM 7897 augustus-07 2007-08-22
10 JPM 7423 november-07 2007-11-02
11 KFT 6992 mei-07 2007-05-14
12 KFT 6944 mei-07 2007-05-21
13 KFT 7069 juli-07 2007-07-09
14 KFT 6919 juli-07 2007-07-16
# Order on the exit time, which corresponds with 'monthYear'
> tmp09.sorted <- tmp09[order(tmp09$ExitTime),]
> tmp09.sorted
Instrument AccountValue monthYear ExitTime
1 JPM 6997 april-07 2007-04-10
11 KFT 6992 mei-07 2007-05-14
12 KFT 6944 mei-07 2007-05-21
2 JPM 7261 mei-07 2007-05-29
13 KFT 7069 juli-07 2007-07-09
14 KFT 6919 juli-07 2007-07-16
3 JPM 7545 juli-07 2007-07-18
4 JPM 7614 juli-07 2007-07-19
5 JPM 7897 augustus-07 2007-08-22
10 JPM 7423 november-07 2007-11-02
Run Code Online (Sandbox Code Playgroud)
到目前为止,这么好,基于ExitTime的排序工作.当我尝试计算每月的总数时,麻烦就开始了,然后尝试对此输出进行排序:
# Calculate the total results per month
> Tmp09Totals <- tapply(tmp09.sorted$AccountValue, tmp09.sorted$monthYear, sum)
> Tmp09Totals <- data.frame(Tmp09Totals)
> Tmp09Totals
Tmp09Totals
april-07 6997
augustus-07 7897
juli-07 29147
mei-07 21197
november-07 7423
Run Code Online (Sandbox Code Playgroud)
如何按时间顺序对输出进行排序?
我已经尝试过(除了将monthYear转换为另一种日期格式的各种尝试):order,sort,sort.list,sort_df,reshape,并根据tapply,lapply,sapply,aggregate计算总和.甚至重写rownames(通过给他们一个从1到长度(tmp09.sorted2$AccountValue)的数字也不起作用.我也尝试根据我在另一个问题中学到的东西给每个月份一个不同的ID,但R也遇到了困难区分各种月份值.
此输出的正确顺序是april-07,mei-07,juli-07,augustus07, november-07:
apr-07 6997
mei-07 21197
jul-07 29147
aug-07 7897
nov-07 7423
Run Code Online (Sandbox Code Playgroud)
以正确的顺序使用单独的Month和Year因子,并且tapply在两个变量的并集上使用会更容易,例如:
## The Month factor
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = month.name)))
## for @Jura25's locale, we can't use the in built English constant
## instead, we can use this solution, from ?month.name:
## format(ISOdate(2000, 1:12, 1), "%B"))
tmp09 <- within(tmp09,
Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
levels = format(ISOdate(2000, 1:12, 1), "%B"))))
##
## And the Year factor
tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y")))
Run Code Online (Sandbox Code Playgroud)
这给了我们(在我的语言环境中):
> head(tmp09)
Instrument AccountValue monthYear ExitTime Month Year
1 JPM 6997 april-07 2007-04-10 April 2007
2 JPM 7261 mei-07 2007-05-29 May 2007
3 JPM 7545 juli-07 2007-07-18 July 2007
4 JPM 7614 juli-07 2007-07-19 July 2007
5 JPM 7897 augustus-07 2007-08-22 August 2007
10 JPM 7423 november-07 2007-11-02 November 2007
Run Code Online (Sandbox Code Playgroud)
然后使用tapply两个因素:
> with(tmp09, tapply(AccountValue, list(Month, Year), sum))
2007
April 6997
May 21197
July 29147
August 7897
November 7423
Run Code Online (Sandbox Code Playgroud)
或通过aggregate:
> with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum))
Month Year x
1 April 2007 6997
2 May 2007 21197
3 July 2007 29147
4 August 2007 7897
5 November 2007 7423
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5860 次 |
| 最近记录: |