使用合法的 Chrome 插件手动抓取 Google 搜索结果后,我有以下信息(仅用于两个搜索结果):
The History Teacher (@THTjournal) | Twitter
https://twitter.com/thtjournal https://twitter.com/thtjournal
Vertaal deze pagina https://translate.google.nl/translate?hl=nl&sl=en&u=https://twitter.com/thtjournal&prev=search
Jim Carroll (@jcarrollhistory) | Twitter
https://twitter.com/jcarrollhistory https://twitter.com/jcarrollhistory
Vertaal deze pagina https://translate.google.nl/translate?hl=nl&sl=en&u=https://twitter.com/jcarrollhistory&prev=search
Run Code Online (Sandbox Code Playgroud)
我的目标是创建一个包含 Twitter URL 的列表,如下所示:
https://twitter.com/thtjournal
https://twitter.com/jcarrollhistory
Run Code Online (Sandbox Code Playgroud)
我有 Notepad++,那么如何使用它来获取仅包含 URL 的列表?其他所有内容都应删除。
^.*?(\bhttps://twitter\.com/\w+)?.*$(?1$1:). matches newline解释:
^ # beginning of line
.*? # 0 or more any character but newline, not greedy
( # start grpup 1
\b # word boundary
https://twitter\.com/ # literally
\w+ # 1 or more word character
)? # end group, optional
.* # 0 or more any character but newline
$ # end of line
Run Code Online (Sandbox Code Playgroud)
替代品:
(?1$1:) # if group 1 exists, then use it as replacement, else replace with nothing
Run Code Online (Sandbox Code Playgroud)
给定示例的结果:
https://twitter.com/thtjournal
https://twitter.com/jcarrollhistory
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3791 次 |
| 最近记录: |