使用 LXML xml 设置没有值的属性

Jes*_*ose 5 python lxml

我想要:

<div data-a>
Run Code Online (Sandbox Code Playgroud)

但 LXML API 似乎只给了我这个:

<div data-a=''>
Run Code Online (Sandbox Code Playgroud)

我如何获得无价值的属性?


令人讨厌的是,LXML 将空白值和空值表示为空白字符串。

设置 None 值没有帮助。

In [19]: from lxml.html import fromstring, tostring

In [20]: b = fromstring('<body class="meow" data-a="haha" data-b data-x="">text-fef27e87389e466fb99b5421629323f6</body>')

In [21]: b.attrib
Out[21]: {'data-a': 'haha', 'data-x': '', 'data-b': '', 'class': 'meow'}

In [22]: b = fromstring('<body class="meow" data-a="haha" data-b data-x="">text-fef27e87389e466fb99b5421629323f6</body>')

In [23]: b.attrib
Out[23]: {'data-a': 'haha', 'data-x': '', 'data-b': '', 'class': 'meow'}

In [24]: b.attrib['data-y'] = None
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-1f55133e3dc4> in <module>()
----> 1 b.attrib['data-y'] = None

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:58775)()

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:19025)()

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._utf8 (src/lxml/lxml.etree.c:26460)()

TypeError: Argument must be bytes or unicode, got 'NoneType'


tag.attrib['data-a'] = None
TypeError: Argument must be bytes or unicode, got 'NoneType'
Run Code Online (Sandbox Code Playgroud)

har*_*r07 2

恕我直言,lxml正在展示预期的行为。没有值的属性会导致格式不正确的 XML,而好的 XML 解析器不会生成格式不正确的 XML: