小编hin*_*nts的帖子

lxml.html.tostring在打印时重新排序了doctype和xml标签

想象一下,我有一个带有内容的文件test.html,

<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Components of the SDK</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body></body></html>

Run Code Online (Sandbox Code Playgroud)

并在python提示符中执行此操作,

>>>import lxml.html
>>>t = lxml.html.parse('test.html')
>>>lxml.html.etree.tostring(t)
>>>'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\n<?xml version="1.0" encoding="UTF-8" standalone="no"??><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Components of the SDK</title><link rel="stylesheet" href="core.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"/></head><body/></html>'

Run Code Online (Sandbox Code Playgroud)

请注意在lxml读入数据然后再通过tostring将其打印出来后,doctype和xml标签是如何反转的？我们如何修复它以便它不会尝试修改文档(假设它已经很好地形成).

python xhtml lxml xml-parsing

hin*_*nts

2011 12-15

4
推荐指数

1
解决办法

1204
查看次数