Pra*_*nde 8 ruby regex nlp ruby-on-rails-3
我试图从一个段落中提取句子,模式就像
Current. time is six thirty at Scotland. Past. time was five thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.
Run Code Online (Sandbox Code Playgroud)
当我使用正则表达式时
/current\..*scotland\./i
Run Code Online (Sandbox Code Playgroud)
这匹配所有字符串
Current. time is six thirty at Scotland. Past. time was six thirty at India; Current. time is five thirty at Scotland. Past. time was five thirty at Scotland. Current. time is five ten at Scotland.
Run Code Online (Sandbox Code Playgroud)
相反,我想在第一次出现"."时停止.对所有捕获组如
Current. time is six thirty at Scotland.
Current. time is five ten at Scotland.
Run Code Online (Sandbox Code Playgroud)
同样的文字喜欢
Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India;
Run Code Online (Sandbox Code Playgroud)
当我使用正则表达式时
/past\..*india\;/i
Run Code Online (Sandbox Code Playgroud)
这匹配将整个字符串
Past. time was five thirty at India; Current. time is six thirty at Scotland. Past. time was five thirty at Scotland. Past. time was five ten at India;
Run Code Online (Sandbox Code Playgroud)
在这里,我想捕获所有组或第一组如下,以及如何在第一次出现时停止";"
Past. time was five thirty at India;
Past. time was five ten at India;
Run Code Online (Sandbox Code Playgroud)
如何使正则表达式停止在","或";" 以上例子?
Mik*_*H-R 12
你应该用你的正则表达式做一些事情,首先,正如Arnal Murali指出的那样,你不应该使用贪婪的正则表达式但是应该使用懒惰的版本:
/current\..*?scotland\./i
Run Code Online (Sandbox Code Playgroud)
我认为首先使用正则表达式的一般规则是因为它更经常是你想要的.其次,你真的不想用来.
匹配所有东西,因为你不想允许你的正则表达式的这一部分匹配,.
或者;
你可以把它们放在负捕获组中来捕获除了它们之外的任何东西:
/current\.[^.]*?scotland\./i
Run Code Online (Sandbox Code Playgroud)
和
/current\.[^;]*?india;/i
Run Code Online (Sandbox Code Playgroud)
或覆盖两者:
/(current|past)\.[^.;]*?(india|scotland)[.;]/i
Run Code Online (Sandbox Code Playgroud)
(显然这可能不是你想要做的,只是包括演示如何扩展这个)
这也是一个很好的经验法则,如果你有一个正则表达式的麻烦做任何通配符更具体的(在这种情况下,从匹配的一切变化.
至,但一切都匹配.
,并;
用[^.;]
)