use*_*765 4 regex lazy-evaluation greedy stata regex-greedy
如何使用正则表达式在 Stata 中实现非贪婪匹配?或者Stata有这个能力吗?
我想提取主题标签“#”和句点“.”之间出现的所有文本。
示例代码:
clear
set obs 3
generate var1="anything#aaabbbccc.dddeee.fff" in 1
replace var1="anything#aaabbbccc.dddeee" in 2
replace var1="anything#aaabbbccc." in 3
generate var2=regexs(1) if regexm(var1,"#(.*)\.")
list
Run Code Online (Sandbox Code Playgroud)
但在 Stata (v.13.1) 中,我似乎无法使用非贪婪字符#(.*?)\.。因此,上面的代码给出了:
+--------------------------------------------------+
| var1 var2 |
|--------------------------------------------------|
| anything#aaabbbccc.dddeee.fff aaabbbccc.dddeee |
| anything#aaabbbccc.dddeee aaabbbccc |
| anything#aaabbbccc. aaabbbccc |
+--------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
但我想要的是这样的:
+--------------------------------------------------+
| var1 var2 |
|--------------------------------------------------|
| anything#aaabbbccc.dddeee.fff aaabbbccc |
| anything#aaabbbccc.dddeee aaabbbccc |
| anything#aaabbbccc. aaabbbccc |
+--------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
使用的一种玩法#(.*?)\.是只匹配哈希符号之后出现的任何非点字符,即以下模式:
#([^.]*)
Run Code Online (Sandbox Code Playgroud)
试试这个代码:
clear
set obs 3
generate var1="anything#aaabbbccc.dddeee.fff" in 1
replace var1="anything#aaabbbccc.dddeee" in 2
replace var1="anything#aaabbbccc." in 3
generate var2=regexs(1) if regexm(var1,"#([^.]*)")
list
Run Code Online (Sandbox Code Playgroud)