Stata根据日期范围合并数据集

sed*_*163 1 merge stata

我意识到有一个确切标题的 Stata 论坛,但我没有发现它的语法很有帮助,特别是因为我的数据集有点不同。我有两个数据集。一个是一个人在设施中的逗留时间,包括设施名称。它看起来像这样:

+---+-------------+---------------+-----------------------+
|ID#|Entrance Date|  Exit Date    | Facility Name         |
|1  | 7/22/2009   | 2/24/2010     | Facility 1            |
|1  | 7/10/2010   | 11/21/2010    | Facility 2            |
|2  | 3/31/2010   | 9/23/2010     | Facility 1            |
|3  | 11/24/2010  | 7/5/2011      | Facility 3            |
|4  | 3/7/2007    | 4/19/2010     | Facility 2            |
+---+-------------+---------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)

下一个数据集显示他们被访问的日期。它只有身份证和访问日期:

+---+-------------+
|ID#|Visit Date   |
| 1 | 08/21/2009  |
| 1 | 09/02/2009  |
| 1 | 09/23/2009 |
| 3 | 04/22/2011  |
| 3 | 05/05/2011  |
+---+-------------+
Run Code Online (Sandbox Code Playgroud)

我想这两个文件一起合并ID#,其中VisitDate在介于Entrance DateExit Date让我可以看到1.谁了游客,他们被在什么设施。

小智 8

有一个新的用户编写的程序称为rangejoinSSC,它是为此类问题量身定制的。要安装它,请在 Stata 的命令窗口中输入:

ssc install rangejoin
Run Code Online (Sandbox Code Playgroud)

rangejoin将根据每次入住的日期(所需时间间隔的范围)和访问日期配对每次入住。所有日期都必须是数字,因此我在下面的示例中将所有日期预先转换为 Stata 日期。

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009"  "2/24/2010"  "Facility 1" 18100 18317
1 "7/10/2010"  "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010"  "9/23/2010"  "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011"   "Facility 3" 18590 18813
4 "3/7/2007"   "4/19/2010"  "Facility 2" 17232 18371
end
format %td datein
format %td dateout

rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)

     +-------------------------------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout        visit      nvisit   visit_~t |
     |-------------------------------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   08/21/2009   21aug2009          3 |
  2. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/02/2009   02sep2009          3 |
  3. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/23/2009   23sep2009          3 |
  4. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  5. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  6. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   04/22/2011   22apr2011          2 |
  7. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   05/05/2011   05may2011          2 |
     |-------------------------------------------------------------------------------------------------------|
  8. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010                        .          0 |
     +-------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

然后,如果需要,您可以使用以下方法恢复原始观察结果:

by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
     +------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout   visit_~t |
     |------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010          3 |
  2. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010          0 |
  3. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010          0 |
  4. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011          2 |
  5. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010          0 |
     +------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)