Sve*_*len 4 html php regex preg-match
我知道这个问题是关于SO的,但是我找不到合适的问题,而且我还在吸食Regex:/
我有一个string和该字符串是有效的HTML.现在我想找到具有特定name和的所有标签attribute.
我想这正则表达式(即格型)/(<div type="my_special_type" src="(.*?)<\/div>)/.
示例字符串:
<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>
Run Code Online (Sandbox Code Playgroud)
如果我使用preg_match,那么我只得到<div type="special_type" src="bla"> match me</div>逻辑,因为另一个具有不同顺序的属性.
在示例字符串上array使用时,我需要获得以下正则表达式preg_match?:
array(0 => '<div type="special_type" src="bla"> match me</div>',
1 => '<div src="blaw" type="special_type" > match me too</div>')
Run Code Online (Sandbox Code Playgroud)
一般建议:不要使用正则表达式来解析HTML如果HTML发生变化会变得混乱.
DOMDocument改为使用:
$str = <<<EOF
<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($str);
$selector = new DOMXPath($doc);
$result = $selector->query('//div[@type="special_type"]');
// loop through all found items
foreach($result as $node) {
echo $node->getAttribute('src');
}
Run Code Online (Sandbox Code Playgroud)
正如hek2msql所说,你最好使用DOMDocument
$html = '
<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>';
$matches = get_matched($html);
function get_matched($html){
$matched = array();
$dom = new DOMDocument();
@$dom->loadHtml($html);
$length = $dom->getElementsByTagName('div')->length;
for($i=0;$i<$length;$i++){
$type = $dom->getElementsByTagName("div")->item($i)->getAttribute("type");
if($type != 'special_type')
continue;
$matched[] = $dom->getElementsByTagName("div")->item($i)->getAttribute('src');
// or $matched[] = $dom->getElementsByTagName("div")->item($i)->nodeValue;
}
return $matched;
}
Run Code Online (Sandbox Code Playgroud)