使用 lxml/XPath 获取第 n 个元素失败

Question

使用 lxml/XPath 获取第 n 个元素失败

这可能是一件非常简单的事情，但我一直在失败。

当root包含一个或多个 "<link />" 时，root.xpath('(//link)') 将全部返回。但是 root.xpath('(//link)[0]') 返回一个空列表。怎么了？

from unittest import TestCase, TestProgram

class T(TestCase):
    base_path = r'(//_:link)'
    def test0ok(self):
        self._test(2, self.base_path)
    def test1ng(self):
        self._test(1, self.base_path + r'[0]')
    def _test(self, expected, path):
        try:
            from lxml.etree import fromstring as parse_xml_string
        except ImportError:
            raise
        root = parse_xml_string(_xhtml)
        nsmap = dict(_=root.nsmap[None])
        gotten = root.xpath(path, namespaces=nsmap)
        gotten = len(gotten)
        self.assertEqual(expected, gotten)

_xhtml = br'''
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"
>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<link rev="made" href="./" />
<link rel="contents" href="./" />
<title>te</title>
</head>
<body>
<h1>st</h1>
</body>
</html>
'''[1:]

if __name__ == r'__main__':
    TestProgram()

Run Code Online (Sandbox Code Playgroud)

Answer 1

ale*_*cxe 5

这是因为XPath 中的索引从 1 开始，而不是 0：

root.xpath('(//link)[1]')

Run Code Online (Sandbox Code Playgroud)

或者，您也可以在 Python 中按索引获取元素（基于 0）：

root.xpath('//link')[0]

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	836 次
最近记录：	9 年，8 月前