Gra*_*it 3 php regex pattern-matching html-parsing
我在PHP中寻找一个正则表达式,它将锚点与其上的特定文本相匹配.例如,我想获得带有文本mylink的锚:
<a href="blabla" ... >mylink</a>
Run Code Online (Sandbox Code Playgroud)
所以它应该匹配所有锚点,但只有它们包含特定文本所以它应匹配这些字符串:
<a href="blabla" ... >mylink</a>
<a href="blabla" ... >blabla mylink</a>
<a href="blabla" ... >mylink bla bla</a>
<a href="blabla" ... >bla bla mylink bla bla</a>
Run Code Online (Sandbox Code Playgroud)
但不是这个:
<a href="blabla" ... >bla bla bla bla</a>
Run Code Online (Sandbox Code Playgroud)
因为这个不包含单词mylink.
这个也不应该匹配:"mylink is string"因为它不是锚.
有人有什么想法吗?
Thanx Granit
尝试使用解析器:
require_once "simple_html_dom.php";
$data = 'Hi, I am looking for a regular expression in PHP which would match the anchor with a
specific text on it. E.g I would like to get anchors with text mylink like:
<a href="blabla" ... >mylink</a>
So it should match all anchors but only if they contain specific text So it should match t
hese string:
<a href="blabla" ... >mylink</a>
<a href="blabla" ... >blabla mylink</a>
<a href="blabla" ... >mylink bla bla</a>
<a href="blabla" ... >bla bla mylink bla bla</a>
but not this one:
<a href="blabla" ... >bla bla bla bla</a> Because this one does not contain word mylink.
Also this one should not match: "mylink is string" because it is not an anchor.
Anybody any Idea? Thanx Granit';
$html = str_get_html($data);
foreach($html->find('a') as $element) {
if(strpos($element->innertext, 'mylink') === false) {
echo 'Ignored: ' . $element->innertext . "\n";
} else {
echo 'Matched: ' . $element->innertext . "\n";
}
}
Run Code Online (Sandbox Code Playgroud)
产生输出:
Matched: mylink
Matched: mylink
Matched: blabla mylink
Matched: mylink bla bla
Matched: bla bla mylink bla bla
Ignored: bla bla bla bla
Run Code Online (Sandbox Code Playgroud)
下载地址simple_html_dom.php:http://simplehtmldom.sourceforge.net/