5 merge r panel-data
我在面板数据表格中有几个数据框.现在我想将这些面板数据框合并为一个面板数据.这些数据框架之间有共同点和不同点.我说明如下:
DF1:
Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-05 A 1 2 3 4 5 6
Feb-05 A 2 3 4 5 6 7
Mar-05 A 3 4 5 6 7 8
Apr-05 A 4 5 6 7 8 9
May-05 A 5 6 7 8 9 10
Jun-05 A 6 7 8 9 10 11
Jul-05 A 7 8 9 10 11 12
Aug-05 A 8 9 10 11 12 13
Sep-05 A 9 10 11 12 13 14
Oct-05 A 10 11 12 13 14 15
Nov-05 A 11 12 13 14 15 16
Dec-05 A 12 13 14 15 16 17
Jan-05 B 12 12 12 12 12 12
Feb-05 B 12 12 12 12 12 12
Mar-05 B 12 12 12 12 12 12
Apr-05 B 12 12 12 12 12 12
May-05 B 12 12 12 12 12 12
Jun-05 B 12 12 12 12 12 12
Jul-05 B 12 12 12 12 12 12
Aug-05 B 12 12 12 12 12 12
Sep-05 B 12 12 12 12 12 12
Oct-05 B 12 12 12 12 12 12
Nov-05 B 12 12 12 12 12 12
Dec-05 B 12 12 12 12 12 12
Run Code Online (Sandbox Code Playgroud)
DF2:
Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-06 A 1 2 3 4 5 6
Feb-06 A 2 3 4 5 6 7
Mar-06 A 3 4 5 6 7 8
Apr-06 A 4 5 6 7 8 9
May-06 A 5 6 7 8 9 10
Jun-06 A 6 7 8 9 10 11
Jul-06 A 7 8 9 10 11 12
Aug-06 A 8 9 10 11 12 13
Sep-06 A 9 10 11 12 13 14
Oct-06 A 10 11 12 13 14 15
Nov-06 A 11 12 13 14 15 16
Dec-06 A 12 13 14 15 16 17
Jan-06 C 12 12 12 12 12 12
Feb-06 C 12 12 12 12 12 12
Mar-06 C 12 12 12 12 12 12
Apr-06 C 12 12 12 12 12 12
May-06 C 12 12 12 12 12 12
Jun-06 C 12 12 12 12 12 12
Jul-06 C 12 12 12 12 12 12
Aug-06 C 12 12 12 12 12 12
Sep-06 C 12 12 12 12 12 12
Oct-05 C 12 12 12 12 12 12
Nov-05 C 12 12 12 12 12 12
Dec-05 C 12 12 12 12 12 12
Run Code Online (Sandbox Code Playgroud)
期望的输出如下,我想合并面板数据帧,使得每个变量长期排列,如果数据不能一年,则它在Beta1,Beta2等下面具有NA.
Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-05 A 1 2 3 4 5 6
Feb-05 A 2 3 4 5 6 7
Mar-05 A 3 4 5 6 7 8
Apr-05 A 4 5 6 7 8 9
May-05 A 5 6 7 8 9 10
Jun-05 A 6 7 8 9 10 11
Jul-05 A 7 8 9 10 11 12
Aug-05 A 8 9 10 11 12 13
Sep-05 A 9 10 11 12 13 14
Oct-05 A 10 11 12 13 14 15
Nov-05 A 11 12 13 14 15 16
Dec-05 A 12 13 14 15 16 17
Jan-06 A 1 2 3 4 5 6
Feb-06 A 2 3 4 5 6 7
Mar-06 A 3 4 5 6 7 8
Apr-06 A 4 5 6 7 8 9
May-06 A 5 6 7 8 9 10
Jun-06 A 6 7 8 9 10 11
Jul-06 A 7 8 9 10 11 12
Aug-06 A 8 9 10 11 12 13
Sep-06 A 9 10 11 12 13 14
Oct-06 A 10 11 12 13 14 15
Nov-06 A 11 12 13 14 15 16
Dec-06 A 12 13 14 15 16 17
Jan-05 B 12 12 12 12 12 12
Feb-05 B 12 12 12 12 12 12
Mar-05 B 12 12 12 12 12 12
Apr-05 B 12 12 12 12 12 12
May-05 B 12 12 12 12 12 12
Jun-05 B 12 12 12 12 12 12
Jul-05 B 12 12 12 12 12 12
Aug-05 B 12 12 12 12 12 12
Sep-05 B 12 12 12 12 12 12
Oct-05 B 12 12 12 12 12 12
Nov-05 B 12 12 12 12 12 12
Dec-05 B 12 12 12 12 12 12
Jan-06 B NA NA NA NA NA NA
Feb-06 B NA NA NA NA NA NA
Mar-06 B NA NA NA NA NA NA
Apr-06 B NA NA NA NA NA NA
May-06 B NA NA NA NA NA NA
Jun-06 B NA NA NA NA NA NA
Jul-06 B NA NA NA NA NA NA
Aug-06 B NA NA NA NA NA NA
Sep-06 B NA NA NA NA NA NA
Oct-06 B NA NA NA NA NA NA
Nov-06 B NA NA NA NA NA NA
Dec-06 B NA NA NA NA NA NA
Jan-05 C NA NA NA NA NA NA
Feb-05 C NA NA NA NA NA NA
Mar-05 C NA NA NA NA NA NA
Apr-05 C NA NA NA NA NA NA
May-05 C NA NA NA NA NA NA
Jun-05 C NA NA NA NA NA NA
Jul-05 C NA NA NA NA NA NA
Aug-05 C NA NA NA NA NA NA
Sep-05 C NA NA NA NA NA NA
Oct-05 C NA NA NA NA NA NA
Nov-05 C NA NA NA NA NA NA
Dec-05 C NA NA NA NA NA NA
Jan-06 C 12 12 12 12 12 12
Feb-06 C 12 12 12 12 12 12
Mar-06 C 12 12 12 12 12 12
Apr-06 C 12 12 12 12 12 12
May-06 C 12 12 12 12 12 12
Jun-06 C 12 12 12 12 12 12
Jul-06 C 12 12 12 12 12 12
Aug-06 C 12 12 12 12 12 12
Sep-06 C 12 12 12 12 12 12
Oct-06 C 12 12 12 12 12 12
Nov-06 C 12 12 12 12 12 12
Dec-06 C 12 12 12 12 12 12
Run Code Online (Sandbox Code Playgroud)
正如我之前提到的,我将几个数据框合并并合并它们可能会产生十万行,所以我可以解决内存和空间问题.我将衷心感谢您的帮助.
有一个功能.将数据帧与rbind.然后用complete.它会查看组中的内容variable并填充任何缺少的值:
library(tidyr)
df3 <- do.call(rbind.data.frame, list(df1, df2))
df3$Month <- as.character(df3$Month)
df4 <- complete(df3, Month, variable)
df4$Month <- as.yearmon(df4$Month, "%b %Y")
df5 <- df4[order(df4$variable,df4$Month),]
df5
# Source: local data frame [72 x 8]
#
# Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
# (yrmn) (fctr) (int) (int) (int) (int) (int) (int)
# 1 Jan 2005 A 1 2 3 4 5 6
# 2 Feb 2005 A 2 3 4 5 6 7
# 3 Mar 2005 A 3 4 5 6 7 8
# 4 Apr 2005 A 4 5 6 7 8 9
# 5 May 2005 A 5 6 7 8 9 10
# 6 Jun 2005 A 6 7 8 9 10 11
# 7 Jul 2005 A 7 8 9 10 11 12
# 8 Aug 2005 A 8 9 10 11 12 13
# 9 Sep 2005 A 9 10 11 12 13 14
# 10 Oct 2005 A 10 11 12 13 14 15
# .. ... ... ... ... ... ... ... ...
Run Code Online (Sandbox Code Playgroud)
使用dplyr和tidyr的替代实现:
library(dplyr)
library(tidyr)
df3 <- bind_rows(df1, df2) %>%
complete(Month, variable)
Run Code Online (Sandbox Code Playgroud)