我有一个字符串的集合,我想要的正则表达式是收集所有开始与http ..
HREF = "http://www.test.com/cat/1-one_piece_episodes/的" href = "http://www.test.com/cat/2-movies_english_subbed/" HREF ="HTTP://www.test的.com /猫/ 3-english_dubbed/"HREF =" http://www.exclude.com"
这是我的正则表达式模式..
href="(.*?)[^#]"
Run Code Online (Sandbox Code Playgroud)
并返回此
href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"
Run Code Online (Sandbox Code Playgroud)
什么是排除最后一场比赛的模式..或排除具有排除域内的匹配,如href ="http://www.exclude.com"
编辑: 多次排除
href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"
Run Code Online (Sandbox Code Playgroud)
Tim*_*ker 15
@ridgerunner和我会将正则表达式更改为:
href="((?:(?!\bexclude\b)[^"])*)[^#]"
Run Code Online (Sandbox Code Playgroud)
它匹配所有href属性,只要它们不结束#并且不包含该单词exclude.
说明:
href=" # Match href="
( # Capture...
(?: # the following group:
(?! # Look ahead to check that the next part of the string isn't...
\b # the entire word
exclude # exclude
\b # (\b are word boundary anchors)
) # End of lookahead
[^"] # If successful, match any character except for a quote
)* # Repeat as often as possible
) # End of capturing group 1
[^#]" # Match a non-# character and the closing quote.
Run Code Online (Sandbox Code Playgroud)
允许多个"禁词":
href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"
Run Code Online (Sandbox Code Playgroud)