如何使用Python在多行文本中搜索XPath中的内容?

Thi*_*ode 3 python xpath lxml

当我使用contains搜索元素的text()中数据的存在时,它适用于普通数据,但是当元素内容中有回车符,新行/标记时则不行.//td[contains(text(), "")]在这种情况下如何工作?谢谢!

XML:

<table>
  <tr>
    <td>
      Hello world <i> how are you? </i>
      Have a wonderful day.
      Good bye!
    </td>
  </tr>
  <tr>
    <td>
      Hello NJ <i>, how are you?
      Have a wonderful day.</i>
    </td>
  </tr>
</table>
Run Code Online (Sandbox Code Playgroud)

Python:

>>> tdout=open('tdmultiplelines.htm', 'r')
>>> tdouthtml=lh.parse(tdout)
>>> tdout.close()
>>> tdouthtml
<lxml.etree._ElementTree object at 0x2aaae0024368>
>>> tdouthtml.xpath('//td/text()')
['\n      Hello world ', '\n      Have a wonderful day.\n      Good bye!\n    ', '\n      Hello NJ ', '\n    ']
>>> tdouthtml.xpath('//td[contains(text(),"Good bye")]')
[]  ##-> But *Good bye* is already in the `td` contents, though as a list.
>>> tdouthtml.xpath('//td[text() = "\n      Hello world "]')
[<Element td at 0x2aaae005c410>]
Run Code Online (Sandbox Code Playgroud)

Dim*_*hev 5

用途:

//td[text()[contains(.,'Good bye')]]
Run Code Online (Sandbox Code Playgroud)

说明:

问题的原因不是文本节点的字符串值是多行字符串 - 真正的原因是该td元素具有多个文本节点子节点.

在提供的表达式中:

//td[contains(text(),"Good bye")]
Run Code Online (Sandbox Code Playgroud)

传递给函数的第一个参数contains()是一个包含多个文本节点的节点集.

根据XPath 1.0规范(在XPath 2.0中,这只会引发类型错误),对需要字符串参数但是传递给节点集的函数的求值,只获取节点中第1个节点的字符串值 -设定.

在这种特定情况下,传递的节点集的第一个文本节点具有字符串值:

 "
                 Hello world "
Run Code Online (Sandbox Code Playgroud)

因此比较失败并且td未选择所需元素.

基于XSLT的验证:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="//td[text()[contains(.,'Good bye')]]"/>
 </xsl:template>
</xsl:stylesheet>
Run Code Online (Sandbox Code Playgroud)

在提供的XML文档上应用此转换时:

<table>
      <tr>
        <td>
          Hello world <i> how are you? </i>
          Have a wonderful day.
          Good bye!
        </td>
      </tr>
      <tr>
        <td>
          Hello NJ <i>, how are you?
          Have a wonderful day.</i>
        </td>
      </tr>
</table>
Run Code Online (Sandbox Code Playgroud)

评估XPath表达式,并将选定的节点(在本例中只是一个)复制到输出:

<td>
          Hello world <i> how are you? </i>
          Have a wonderful day.
          Good bye!
        </td>
Run Code Online (Sandbox Code Playgroud)

  • @ThinkCode:`<x> string1 <y> string2string3 </ y>`你正在寻找一个包含`string1string2`的文本节点 (2认同)