使用正则表达式进行非贪婪(惰性)匹配?

use*_*765 4 regex lazy-evaluation greedy stata regex-greedy

如何使用正则表达式在 Stata 中实现非贪婪匹配?或者Stata有这个能力吗?

我想提取主题标签“#”和句点“.”之间出现的所有文本。

示例代码:

clear
set obs 3
generate var1="anything#aaabbbccc.dddeee.fff" in 1
replace var1="anything#aaabbbccc.dddeee" in 2
replace var1="anything#aaabbbccc." in 3
generate var2=regexs(1) if regexm(var1,"#(.*)\.")
list
Run Code Online (Sandbox Code Playgroud)

但在 Stata (v.13.1) 中,我似乎无法使用非贪婪字符#(.*?)\.。因此,上面的代码给出了:

+--------------------------------------------------+
|                          var1               var2 |
|--------------------------------------------------|
| anything#aaabbbccc.dddeee.fff   aaabbbccc.dddeee |
|     anything#aaabbbccc.dddeee          aaabbbccc |
|           anything#aaabbbccc.          aaabbbccc |
+--------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

但我想要的是这样的:

+--------------------------------------------------+
|                          var1               var2 |
|--------------------------------------------------|
| anything#aaabbbccc.dddeee.fff          aaabbbccc |
|     anything#aaabbbccc.dddeee          aaabbbccc |
|           anything#aaabbbccc.          aaabbbccc |
+--------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

Tim*_*sen 5

使用的一种玩法#(.*?)\.是只匹配哈希符号之后出现的任何非点字符,即以下模式:

#([^.]*)
Run Code Online (Sandbox Code Playgroud)

试试这个代码:

clear
set obs 3
generate var1="anything#aaabbbccc.dddeee.fff" in 1
replace var1="anything#aaabbbccc.dddeee" in 2
replace var1="anything#aaabbbccc." in 3
generate var2=regexs(1) if regexm(var1,"#([^.]*)")
list
Run Code Online (Sandbox Code Playgroud)

演示