正则表达式与捕获组

Question

正则表达式与捕获组

我正在尝试从长文本中提取文件名.

文件名都在路径中
路径始终以文本为前缀 Page source
它们可以出现在任何地方
该文本包含多行
所有文件名以.结尾 .html

鉴于以下文字:

Page source file:///somedir/subdir/subdir/mysource.html lorem ipsum more text
Lorem Ipsum ...
Lorem Ipsum Page source file:///anotherdir/sub/dir/anothersource.html

Run Code Online (Sandbox Code Playgroud)

我想要一个所有文件名的列表:

mysource.html
anothersource.html

Run Code Online (Sandbox Code Playgroud)

我一直在尝试使用以下正则表达式:

// this only gets the last one (because of the greedy .*)
Page source.*\/(.*\.html)

// This gets all occurrences, but the value in my capture group is the 
// complete path starting after the first occurrence of /
Page source.*?\/(.*?\.html)

Run Code Online (Sandbox Code Playgroud)

我怎么能告诉正则表达式引擎对外表达式不贪婪,但仍然贪婪到最后/才能到达该.html部分之前？

Answer 1

Dmi*_*rov 7

Page source.*?([^\/]+?\.html)

Run Code Online (Sandbox Code Playgroud)

演示:https://regex101.com/r/uX6fY2/2

归档时间：	10 年，7 月前
查看次数：	64 次
最近记录：	10 年，7 月前