Abb*_*999 1 regex vb.net visual-studio-2010
我正在使用VB.NET并尝试从随机句子中提取年份和国家; 如果两者都可用:
我的输入将如下所示:
This is just the title and has no year or country:
Preamble with only year 1999 and no country:
I was born in 1990 in Canada, I was born to love, and be loved.
She was born in 2000 in Malaysia and she likes fishing.
My mother was born in South Africa and she love all her sons and daughters, she was born in 1960.
My Dad was born in a small village in France in 1955. He loves my Mom.
and finally thanks from USA, without a year.
Run Code Online (Sandbox Code Playgroud)
我想从上面得到以下输出:
***EMPTY
***EMPTY
1990 - Canada
2000 - Malaysia
1960 - South Africa
1955 - France
***EMPTY
Run Code Online (Sandbox Code Playgroud)
我整个上午都在读这篇文章REGEX,我认为它可能会成功; 但我放弃了; 谁能帮忙; 提前致谢...
假设您可以构建一个国家/地区列表,您可以将它组合成一系列的变更,如下所示:
(Canada|Malaysia|France|South Africa)
Run Code Online (Sandbox Code Playgroud)
必须优化长列表,但这是另一个故事(见下文).
然后你可以使用这样的正则表达式:
^(?=.*(\b\d{4}\b))(?=.*\b(Canada|Malaysia|France|South Africa)\b)
Run Code Online (Sandbox Code Playgroud)
要将年份和国家/地区捕获到第1组和第2组.在正则表达式演示中,请参阅右侧窗格中的捕获.
捕获:
1990 Canada
2000 Malaysia
1960 South Africa
1955 France
Run Code Online (Sandbox Code Playgroud)
优化国家清单
首先,你需要组织清单,如果国名是另一个的子串 - 例如两个几内亚和几内亚比绍,苏丹和南苏丹,多米尼加和多米尼加共和国 - 最长的是第一个有机会匹配.
您还需要知道您的输入.例如,您是否需要考虑美国 和美利坚合众国等变体?
此外,您希望Fairyland和Fantasylandas一样Fa(?:ir|ntas)yland,这有助于引擎更快地匹配(或失败).有256个国家的列表,创建这样一个优化列表是一个挑战,但有些工具可以帮助您.regex-opt并Regex::Assemble浮现在脑海中.
| 归档时间: |
|
| 查看次数: |
51 次 |
| 最近记录: |