xlt*_*ttj 4 php xpath simplexml
I try to use SimpleXML in combination with XPath to find nodes which contain a certain string.
<?php
$xhtml = <<<EOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Test</title>
</head>
<body>
<p>Find me!</p>
<p>
<br />
Find me!
<br />
</p>
</body>
</html>
EOC;
$xml = simplexml_load_string($xhtml);
$xml->registerXPathNamespace('xhtml', 'http://www.w3.org/1999/xhtml');
$nodes = $xml->xpath("//*[contains(text(), 'Find me')]");
echo count($nodes);
Run Code Online (Sandbox Code Playgroud)
Expected output: 2 Actual output: 1
When I change the xhtml of the second paragraph to
<p>
Find me!
<br />
</p>
Run Code Online (Sandbox Code Playgroud)
then it works like expected. How has my XPath expression has to look like to match all nodes containing 'Find me' no matter where they are?
Using PHP's DOM-XML is an option, but not desired.
Thank's in advance!
这取决于你想做什么.您可以选择<p/>包含其中任何后代中"查找我"的所有元素
//xhtml:p[contains(., 'Find me')]
Run Code Online (Sandbox Code Playgroud)
这将返回重复的,所以你没有指定类型的节点,那么它将返回<body/>并<html/>为好.
或者您可能希望任何包含子节点(不是后代)的节点包含"查找我"
//*[text()[contains(., 'Find me')]]
Run Code Online (Sandbox Code Playgroud)
这个不会回来<html/>或<body/>.
我忘了提到.代表节点的全文内容.text()用于检索[文本节点的节点集].表达式的问题contains(text(), 'Find me')在于contains()只能处理字符串,而不能处理节点集,因此它会转换text()为第一个节点的值,这就是删除第一个节点的原因<br/>.