我有一个包含许多URL的文本文档.URls有许多不同的结局,如.net,.com,.de等......所有的URL都没有http:// oder www.在前.文档中还有许多其他文本,它看起来像这样:
2014/05/03 Red V!per M R United States jsugarcia.com/viper.gif Linux mirror
2014/05/03 Red V!per M R United States thepeoplecenter.org/viper.gif Linux mirror
2014/05/03 Red V!per R Netherlands ghijbeek.nl/viper.gif Linux mirror
2014/05/03 Red V!per M R Netherlands straalbedrijfsanders.nl/viper.gif Linux mirror
2014/05/03 Red V!per R European Union serialnastya.com/viper.gif Linux mirror
2014/05/03 Red V!per M R Denmark thueringer-treppenlifte.de/vip... Linux mirror
2014/05/03 Red V!per R United States tapitwater.com/images/viper.gif Linux mirror
2014/05/03 Red V!per R Norway sekureco.no/viper.gif Linux mirror
Run Code Online (Sandbox Code Playgroud)
我想现在在Notepad ++中过滤,这样我只有带有这样的linebrak的URL:
site.com
似乎所有的行都被终止了Linux mirror,如果它总是如此,你可以这样做:
^.+\s+([^\s/]+)\S+\s+Linux\s+mirror$1说明:
^ : begining of line
.+ : 1 or more any character
\s+ : 1 or more space
( : start group 1
[^\s/]+ : 1 or more NON space or NON slash (The domain)
) : end group 1
\S+ : 1 or more NON space
\s+ : 1 or more space
Linux : literally Linux
\s+ : 1 or more space
mirror : literally mirror
Run Code Online (Sandbox Code Playgroud)
给出示例的结果:
jsugarcia.com
thepeoplecenter.org
ghijbeek.nl
straalbedrijfsanders.nl
serialnastya.com
thueringer-treppenlifte.de
tapitwater.com
sekureco.no
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
742 次 |
| 最近记录: |