根据月 - 年时间格式对数据框进行排序

Jur*_*ura 4 sorting time r

我正在努力解决一些非常基本的问题:根据时间格式对数据框进行排序(月份,或者,在这种情况下为"%B-%y").我的目标是计算各种月度统计数据,从总和开始.

数据框的相关部分的部分看起来像这样*(这很好,并且符合我的目标.我在此处包含它以显示问题可能源自何处)*:

> tmp09
   Instrument AccountValue   monthYear   ExitTime
1         JPM         6997    april-07 2007-04-10
2         JPM         7261      mei-07 2007-05-29
3         JPM         7545     juli-07 2007-07-18
4         JPM         7614     juli-07 2007-07-19
5         JPM         7897 augustus-07 2007-08-22
10        JPM         7423 november-07 2007-11-02
11        KFT         6992      mei-07 2007-05-14
12        KFT         6944      mei-07 2007-05-21
13        KFT         7069     juli-07 2007-07-09
14        KFT         6919     juli-07 2007-07-16
# Order on the exit time, which corresponds with 'monthYear'
> tmp09.sorted <- tmp09[order(tmp09$ExitTime),]
> tmp09.sorted
   Instrument AccountValue   monthYear   ExitTime
1         JPM         6997    april-07 2007-04-10
11        KFT         6992      mei-07 2007-05-14
12        KFT         6944      mei-07 2007-05-21
2         JPM         7261      mei-07 2007-05-29
13        KFT         7069     juli-07 2007-07-09
14        KFT         6919     juli-07 2007-07-16
3         JPM         7545     juli-07 2007-07-18
4         JPM         7614     juli-07 2007-07-19
5         JPM         7897 augustus-07 2007-08-22
10        JPM         7423 november-07 2007-11-02
Run Code Online (Sandbox Code Playgroud)

到目前为止,这么好,基于ExitTime的排序工作.当我尝试计算每月的总数时,麻烦就开始了,然后尝试对此输出进行排序:

# Calculate the total results per month
> Tmp09Totals <- tapply(tmp09.sorted$AccountValue, tmp09.sorted$monthYear, sum)
> Tmp09Totals <- data.frame(Tmp09Totals)
> Tmp09Totals
            Tmp09Totals
april-07           6997
augustus-07        7897
juli-07           29147
mei-07            21197
november-07        7423
Run Code Online (Sandbox Code Playgroud)

如何按时间顺序对输出进行排序?

我已经尝试过(除了将monthYear转换为另一种日期格式的各种尝试):order,sort,sort.list,sort_df,reshape,并根据tapply,lapply,sapply,aggregate计算总和.甚至重写rownames(通过给他们一个从1到长度(tmp09.sorted2$AccountValue)的数字也不起作用.我也尝试根据我在另一个问题中学到的东西给每个月份一个不同的ID,但R也遇到了困难区分各种月份值.

此输出的正确顺序是april-07,mei-07,juli-07,augustus07, november-07:

apr-07  6997
mei-07  21197
jul-07  29147
aug-07  7897
nov-07  7423
Run Code Online (Sandbox Code Playgroud)

Rei*_*son 9

以正确的顺序使用单独的MonthYear因子,并且tapply在两个变量的并集上使用会更容易,例如:

## The Month factor
tmp09 <- within(tmp09,
                Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
                                                    levels = month.name)))
## for @Jura25's locale, we can't use the in built English constant
## instead, we can use this solution, from ?month.name:
## format(ISOdate(2000, 1:12, 1), "%B"))
tmp09 <- within(tmp09,
                Month <- droplevels(factor(strftime(ExitTime, format = "%B"),
                                                    levels = format(ISOdate(2000, 1:12, 1), "%B"))))
##
## And the Year factor
tmp09 <- within(tmp09, Year <- factor(strftime(ExitTime, format = "%Y")))
Run Code Online (Sandbox Code Playgroud)

这给了我们(在我的语言环境中):

> head(tmp09)
   Instrument AccountValue   monthYear   ExitTime    Month Year
1         JPM         6997    april-07 2007-04-10    April 2007
2         JPM         7261      mei-07 2007-05-29      May 2007
3         JPM         7545     juli-07 2007-07-18     July 2007
4         JPM         7614     juli-07 2007-07-19     July 2007
5         JPM         7897 augustus-07 2007-08-22   August 2007
10        JPM         7423 november-07 2007-11-02 November 2007
Run Code Online (Sandbox Code Playgroud)

然后使用tapply两个因素:

> with(tmp09, tapply(AccountValue, list(Month, Year), sum))
          2007
April     6997
May      21197
July     29147
August    7897
November  7423
Run Code Online (Sandbox Code Playgroud)

或通过aggregate:

> with(tmp09, aggregate(AccountValue, list(Month = Month, Year = Year), sum))
     Month Year     x
1    April 2007  6997
2      May 2007 21197
3     July 2007 29147
4   August 2007  7897
5 November 2007  7423
Run Code Online (Sandbox Code Playgroud)