删除除记事本 ++ 中的 URL 之外的所有内容

Art*_*hur 5 notepad++

使用合法的 Chrome 插件手动抓取 Google 搜索结果后,我有以下信息(仅用于两个搜索结果):

The History Teacher (@THTjournal) | Twitter
https://twitter.com/thtjournal  https://twitter.com/thtjournal
Vertaal deze pagina https://translate.google.nl/translate?hl=nl&sl=en&u=https://twitter.com/thtjournal&prev=search
Jim Carroll (@jcarrollhistory) | Twitter
https://twitter.com/jcarrollhistory https://twitter.com/jcarrollhistory
Vertaal deze pagina https://translate.google.nl/translate?hl=nl&sl=en&u=https://twitter.com/jcarrollhistory&prev=search
Run Code Online (Sandbox Code Playgroud)

我的目标是创建一个包含 Twitter URL 的列表,如下所示:

https://twitter.com/thtjournal

https://twitter.com/jcarrollhistory
Run Code Online (Sandbox Code Playgroud)

我有 Notepad++,那么如何使用它来获取仅包含 URL 的列表?其他所有内容都应删除。

Tot*_*oto 3

  • Ctrl+H
  • 找什么:^.*?(\bhttps://twitter\.com/\w+)?.*$
  • 用。。。来代替:(?1$1:)
  • 检查环绕
  • 检查正则表达式
  • 不要检查. matches newline
  • Replace all

解释:

^                           # beginning of line
  .*?                       # 0 or more any character but newline, not greedy
  (                         # start grpup 1
    \b                      # word boundary
    https://twitter\.com/   # literally
    \w+                     # 1 or more word character
  )?                        # end group, optional
  .*                        # 0 or more any character but newline
$                           # end of line
Run Code Online (Sandbox Code Playgroud)

替代品:

(?1$1:)         # if group 1 exists, then use it as replacement, else replace with nothing
Run Code Online (Sandbox Code Playgroud)

给定示例的结果:

https://twitter.com/thtjournal


https://twitter.com/jcarrollhistory
Run Code Online (Sandbox Code Playgroud)