我有两个不同年份的数据集,我试图使用该append命令进行组合
.两者都包括许多变量和一个识别变量的人,但这些变量并不相同.
数据集1:
PID Year Car Sex Age
420201 2016 0 Female 70
420202 2016 0 Male 87
420204 2016 0 Female 62
420205 2016 1 Female 34
420207 2016 1 Male 48
Run Code Online (Sandbox Code Playgroud)
数据集2:
PID Year Car Sex Age
420202 2014 1 Male 59
420204 2014 0 Female 76
420205 2014 1 Male 37
420207 2014 1 Male 23
Run Code Online (Sandbox Code Playgroud)
问题在于,当我尝试附加这些数据集时,Stata会生成一个数据集,其中一个数据集中某些标识符的值被错误地赋予其他数据集的标识符.
附加数据集:
PID Year Car Sex Age
420201 2016 0 Female 70
420201 2014 1 Male 59
420202 2016 0 Male 87
420202 2014 0 Female 76
420204 2016 0 Female 62
420204 2014 1 Male 37
420205 2016 1 Female 34
420205 2014 1 Male 23
420207 2016 1 Male 48
420207 2014 1 Male 23
Run Code Online (Sandbox Code Playgroud)
有没有解决这个问题?
Pea*_*cer 11
我之前遇到过这个问题并且它发生了,因为PID你看到的'值'实际上是附加到值的标签1, 2, 3, 4, 5.
为了说明这一点,请考虑以下示例:
clear
input PID Year Car str6 Sex Age
1 2016 0 Female 70
2 2016 0 Male 87
3 2016 0 Female 62
4 2016 1 Female 34
5 2016 1 Male 48
end
label define PID 1 "420201" 2 "420202" 3 "420204" 4 "420205" 5 "420207"
label values PID PID
list
+------------------------------------+
| PID Year Car Sex Age |
|------------------------------------|
1. | 420201 2016 0 Female 70 |
2. | 420202 2016 0 Male 87 |
3. | 420204 2016 0 Female 62 |
4. | 420205 2016 1 Female 34 |
5. | 420207 2016 1 Male 48 |
+------------------------------------+
list, nolabel
+---------------------------------+
| PID Year Car Sex Age |
|---------------------------------|
1. | 1 2016 0 Female 70 |
2. | 2 2016 0 Male 87 |
3. | 3 2016 0 Female 62 |
4. | 4 2016 1 Female 34 |
5. | 5 2016 1 Male 48 |
+---------------------------------+
Run Code Online (Sandbox Code Playgroud)
因此,当您尝试append以下情况时:
clear
input PID Year Car str6 Sex Age
1 2014 1 Male 59
2 2014 0 Female 76
3 2014 1 Male 37
4 2014 1 Male 23
end
label define PID 1 "420202" 2 "420204" 3 "420205" 4 "420207"
label values PID PID
save data2, replace
append using data1
sort PID
list
+------------------------------------+
| PID Year Car Sex Age |
|------------------------------------|
1. | 420202 2014 1 Male 59 |
2. | 420202 2016 0 Female 70 |
3. | 420204 2016 0 Male 87 |
4. | 420204 2014 0 Female 76 |
5. | 420205 2014 1 Male 37 |
|------------------------------------|
6. | 420205 2016 0 Female 62 |
7. | 420207 2016 1 Female 34 |
8. | 420207 2014 1 Male 23 |
9. | 5 2016 1 Male 48 |
+------------------------------------+
Run Code Online (Sandbox Code Playgroud)
您的价值标签可能有不同的定义,但想法是一样的.
为了能够append正确地处理这两个数据集,首先需要转换PID为字符串:
foreach dta in data1 data2 {
use `dta', clear
decode PID, generate(PID2)
drop PID
rename PID2 PID
save `dta', replace
}
append using data1
order PID
sort PID
list
+------------------------------------+
| PID Year Car Sex Age |
|------------------------------------|
1. | 420201 2016 0 Female 70 |
2. | 420202 2016 0 Male 87 |
3. | 420202 2014 1 Male 59 |
4. | 420204 2016 0 Female 62 |
5. | 420204 2014 0 Female 76 |
|------------------------------------|
6. | 420205 2016 1 Female 34 |
7. | 420205 2014 1 Male 37 |
8. | 420207 2016 1 Male 48 |
9. | 420207 2014 1 Male 23 |
+------------------------------------+
Run Code Online (Sandbox Code Playgroud)
您可能还希望使用该destring命令将新字符串PID变量转换为数字变量.