我意识到有一个确切标题的 Stata 论坛,但我没有发现它的语法很有帮助,特别是因为我的数据集有点不同。我有两个数据集。一个是一个人在设施中的逗留时间,包括设施名称。它看起来像这样:
+---+-------------+---------------+-----------------------+
|ID#|Entrance Date| Exit Date | Facility Name |
|1 | 7/22/2009 | 2/24/2010 | Facility 1 |
|1 | 7/10/2010 | 11/21/2010 | Facility 2 |
|2 | 3/31/2010 | 9/23/2010 | Facility 1 |
|3 | 11/24/2010 | 7/5/2011 | Facility 3 |
|4 | 3/7/2007 | 4/19/2010 | Facility 2 |
+---+-------------+---------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
下一个数据集显示他们被访问的日期。它只有身份证和访问日期:
+---+-------------+
|ID#|Visit Date |
| 1 | 08/21/2009 |
| 1 | 09/02/2009 |
| 1 | 09/23/2009 |
| 3 | 04/22/2011 |
| 3 | 05/05/2011 |
+---+-------------+
Run Code Online (Sandbox Code Playgroud)
我想这两个文件一起合并ID#,其中VisitDate在介于Entrance Date并Exit Date让我可以看到1.谁了游客,他们被在什么设施。
小智 8
有一个新的用户编写的程序称为rangejoinSSC,它是为此类问题量身定制的。要安装它,请在 Stata 的命令窗口中输入:
ssc install rangejoin
Run Code Online (Sandbox Code Playgroud)
rangejoin将根据每次入住的日期(所需时间间隔的范围)和访问日期配对每次入住。所有日期都必须是数字,因此我在下面的示例中将所有日期预先转换为 Stata 日期。
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009" "2/24/2010" "Facility 1" 18100 18317
1 "7/10/2010" "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010" "9/23/2010" "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011" "Facility 3" 18590 18813
4 "3/7/2007" "4/19/2010" "Facility 2" 17232 18371
end
format %td datein
format %td dateout
rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)
+-------------------------------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit nvisit visit_~t |
|-------------------------------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 08/21/2009 21aug2009 3 |
2. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/02/2009 02sep2009 3 |
3. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/23/2009 23sep2009 3 |
4. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
5. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
6. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 04/22/2011 22apr2011 2 |
7. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 05/05/2011 05may2011 2 |
|-------------------------------------------------------------------------------------------------------|
8. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 . 0 |
+-------------------------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
然后,如果需要,您可以使用以下方法恢复原始观察结果:
by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
+------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit_~t |
|------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 3 |
2. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 0 |
3. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 0 |
4. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 2 |
5. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 0 |
+------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)