Notepad ++:如何删除除url之外的所有内容?

Art*_* DT 0 regex notepad++

我有一个包含许多URL的文本文档.URls有许多不同的结局,如.net,.com,.de等......所有的URL都没有http:// oder www.在前.文档中还有许多其他文本,它看起来像这样:

2014/05/03  Red V!per       M   R   United States       jsugarcia.com/viper.gif Linux   mirror
2014/05/03  Red V!per       M   R   United States       thepeoplecenter.org/viper.gif   Linux   mirror
2014/05/03  Red V!per           R   Netherlands     ghijbeek.nl/viper.gif   Linux   mirror
2014/05/03  Red V!per       M   R   Netherlands     straalbedrijfsanders.nl/viper.gif   Linux   mirror
2014/05/03  Red V!per           R   European Union      serialnastya.com/viper.gif  Linux   mirror
2014/05/03  Red V!per       M   R   Denmark     thueringer-treppenlifte.de/vip...   Linux   mirror
2014/05/03  Red V!per           R   United States       tapitwater.com/images/viper.gif Linux   mirror
2014/05/03  Red V!per           R   Norway      sekureco.no/viper.gif   Linux   mirror
Run Code Online (Sandbox Code Playgroud)

我想现在在Notepad ++中过滤,这样我只有带有这样的linebrak的URL:

site.com

Tot*_*oto 5

似乎所有的行都被终止了Linux mirror,如果它总是如此,你可以这样做:

  • Ctrl+H
  • 找什么: ^.+\s+([^\s/]+)\S+\s+Linux\s+mirror
  • 用...来代替: $1
  • Replace all

说明:

^           : begining of line
  .+        : 1 or more any character
  \s+       : 1 or more space
  (         : start group 1
    [^\s/]+ : 1 or more NON space or NON slash (The domain)
  )         : end group 1
  \S+       : 1 or more NON space
  \s+       : 1 or more space
  Linux     : literally Linux
  \s+       : 1 or more space
  mirror    : literally mirror
Run Code Online (Sandbox Code Playgroud)

给出示例的结果:

jsugarcia.com
thepeoplecenter.org
ghijbeek.nl
straalbedrijfsanders.nl
serialnastya.com
thueringer-treppenlifte.de
tapitwater.com
sekureco.no
Run Code Online (Sandbox Code Playgroud)