我正在尝试使用TraMineR运行最佳匹配分析,但似乎我遇到了数据集大小的问题.我有一个包含就业法术的欧洲国家的大数据集.我有超过57,000个序列,长48个单位,由9个不同的状态组成.为了了解分析,这里是序列对象的头部employdat.sts:
[1] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-...
[2] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-...
[3] ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-...
[4] ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-...
[5] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-...
[6] ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-...
Run Code Online (Sandbox Code Playgroud)
在较短的SPS格式中,其内容如下:
Sequence
[1] "(EF,48)"
[2] "(EF,48)"
[3] "(ST,48)"
[4] "(ST,36)-(MS,3)-(EF,9)"
[5] "(EF,48)"
[6] "(ST,24)-(EF,24)"
Run Code Online (Sandbox Code Playgroud)
将此序列对象传递给seqdist()函数后,我收到以下错误消息:
employdat.om <- seqdist(employdat.sts, method="OM", sm="CONSTANT", indel=4)
[>] creating 9x9 substitution-cost matrix using 2 as constant value
[>] 57160 sequences with 9 distinct events/states
[>] 12626 distinct sequences
[>] min/max sequence length: 48/48
[>] computing distances using OM metric
Error in .Call(TMR_cstringdistance, as.integer(dseq), as.integer(dim(dseq)), : negative length vectors are …Run Code Online (Sandbox Code Playgroud)