Rya*_*yan 8 php regex preg-replace domdocument preg-match
我需要以几种不同的方式处理html字符串中的链接.
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.'
$links = extractLinks($str);
foreach ($links as $link) {
$pattern = "#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie";
if (preg_match($pattern,$str)) {
// Process Remote links
// For example, replace url with short url,
// or replace long anchor text with truncated
} else {
// Process Local Links, Anchors
}
}
function extractLinks($str) {
// First, I tried DomDocument
$dom = new DomDocument();
$dom->loadHTML($str);
return $dom->getElementsByTagName('a');
// But this just returns:
// DOMNodeList Object
// (
// [length] => 3
// )
// Then I tried Regex
if(preg_match_all("|<a.*(?=href=\"([^\"]*)\")[^>]*>([^<]*)</a>|i", $str, $matches)) {
print_r($matches);
}
// But this didn't work either.
}
Run Code Online (Sandbox Code Playgroud)
期望的结果extractLinks($str):
[0] => Array(
'str' = '<a href="http://example.com/abc" rel="link">string</a>',
'href' = 'http://example.com/abc';
'anchorText' = 'string'
),
[1] => Array(
'str' = '<a href="/local/path" title="with attributes">number</a>',
'href' = '/local/path';
'anchorText' = 'number'
),
[2] => Array(
'str' = '<a href="#anchor" data-attr="lots">links</a>',
'href' = '#anchor';
'anchorText' = 'links'
);
Run Code Online (Sandbox Code Playgroud)
我需要所有这些,所以我可以做一些事情,比如编辑href(添加跟踪,缩短等),或用其他东西替换整个标签(<a href="/u/username">username</a>可能会变成username).
这是我正在尝试做的演示.
Jav*_*vad 13
您只需将其更改为:
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.';
$dom = new DomDocument();
$dom->loadHTML($str);
$output = array();
foreach ($dom->getElementsByTagName('a') as $item) {
$output[] = array (
'str' => $dom->saveHTML($item),
'href' => $item->getAttribute('href'),
'anchorText' => $item->nodeValue
);
}
Run Code Online (Sandbox Code Playgroud)
通过把它在一个循环和使用getAttribute,nodeValue以及saveHTML(THE_NODE)你将有你的输出
像这样
<a\s*href="([^"]+)"[^>]+>([^<]+)</a>
Run Code Online (Sandbox Code Playgroud)
使用 preg_match($pattern,$string,$m)
数组元素将在 $m[0] $m[1] $m[3]
$string = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>. ';
$regex='|<a\s*href="([^"]+)"[^>]+>([^<]+)</a>|';
$howmany = preg_match_all($regex,$string,$res,PREG_SET_ORDER);
print_r($res);
Run Code Online (Sandbox Code Playgroud)