小编vin*_*hal的帖子

如何使用utf-8编码制作lxml输出文件

数据.xml

\n\n
<?xml version="1.0" encoding="UTF-8"?>\n<ArticleSet>\n    <Article>            \n        <LastName>Bojarski</LastName>\n        <ForeName>-</ForeName>\n        <Affiliation>-</Affiliation>            \n    </Article>\n    <Article>            \n        <LastName>Gen\xc3\xa7</LastName>\n        <ForeName>Yasemin</ForeName>\n        <Affiliation>fgjfgnfgn</Affiliation>            \n    </Article>\n</ArticleSet>\n
Run Code Online (Sandbox Code Playgroud)\n\n

示例代码

\n\n
from lxml import etree\n\ndom = etree.parse(\'data.xml\')\nroot = dom.getroot()\n\nfor article in dom.xpath(\'Article[Affiliation="-"]\'):\n    root.remove(article)\n\ndom.write(\'output.xml\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

此代码删除其隶属关系等于的文章 - 即其隶属标签看起来像<Affliation>-</Affliation>\n当我将剩余的输出存储到 output.xml 中时,它会将 Unicode 字符解析Gen\xc3\xa7Gen&#231;我想按原样存储它。

\n\n

代码的输出

\n\n
<ArticleSet>\n    <Article>            \n        <LastName>Gen&#231;</LastName>\n        <ForeName>Yasemin</ForeName>\n        <Affiliation>fgjfgnfgn</Affiliation>            \n    </Article>\n</ArticleSet>\n
Run Code Online (Sandbox Code Playgroud)\n\n

所需输出

\n\n
<ArticleSet>\n    <Article>            \n        <LastName>Gen\xc3\xa7</LastName>\n        <ForeName>Yasemin</ForeName>\n        <Affiliation>fgjfgnfgn</Affiliation>            \n    </Article>\n</ArticleSet>\n
Run Code Online (Sandbox Code Playgroud)\n

python lxml

2
推荐指数
1
解决办法
4353
查看次数

标签 统计

lxml ×1

python ×1