如何让BeautifulSoup 4尊重自动关闭标签？

Question

如何让BeautifulSoup 4尊重自动关闭标签？

Hoo*_*ked 8 python xml beautifulsoup xml-parsing

这个问题特定于BeautifulSoup4,这使得它与以前的问题不同:

既然BeautifulStoneSoup已经消失了(以前的xml解析器),我怎样才能bs4尊重新的自闭标签？例如:

import bs4   
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])

print soup.prettify()

Run Code Online (Sandbox Code Playgroud)

不会自动关闭bar标签,但会给出提示.bs4所指的这个树构建器是什么以及如何自我关闭标记？

/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
  "BS4 does not respect the selfClosingTags argument to the "
<html>
 <body>
  <foo>
   <bar a="3">
   </bar>
  </foo>
 </body>
</html>

Run Code Online (Sandbox Code Playgroud)

Answer 1

Pav*_*sov 12

要解析XML,请将"xml"作为BeautifulSoup构造函数的第二个参数传递.

soup = bs4.BeautifulSoup(S, 'xml')

Run Code Online (Sandbox Code Playgroud)

您需要安装lxml.

你不需要再通过selfClosingTags了:

In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
 <bar a="3"/>
</foo>

Run Code Online (Sandbox Code Playgroud)

@detly，如果它是空的，它将自动关闭（在 XML 模式下）。 (2认同)

归档时间：	12 年，10 月前
查看次数：	3672 次
最近记录：	12 年，10 月前