如何告诉lxml.etree.tostring(element)不要在python中编写命名空间？

Question

如何告诉lxml.etree.tostring(element)不要在python中编写命名空间？

Auf*_*ind 8 python lxml namespaces tostring elementtree

我有一个巨大的xml文件(1 Gig).我想将一些元素(entrys)移动到具有相同标题和规范的另一个文件.

假设原始文件包含带有标记的条目<to_move>:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE some SYSTEM "some.dtd">
<some>
...
<to_move date="somedate">
    <child>some text</child>
    ...
...
</to_move>
...
</some>

Run Code Online (Sandbox Code Playgroud)

我使用lxml.etree.iterparse迭代文件.工作良好.当我找到带有标签的元素时<to_move>,让我们假设它存储在element我做的变量中

new_file.write(etree.tostring(element))

Run Code Online (Sandbox Code Playgroud)

但这导致了

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE some SYSTEM "some.dtd">
<some>
...
<to_move xmlns:="some" date="somedate">  # <---- Here is the problem. I don't want the namespace.
    <child>some text</child>
    ...
...
</to_move>
...
</some>

Run Code Online (Sandbox Code Playgroud)

所以问题是:如何告诉etree.tostring()不要写xmlns:="some".这可能吗？我使用了lxml.etree的api文档,但我找不到令人满意的答案.

这是我找到的etree.trostring:

tostring(element_or_tree, encoding=None, method="xml",
xml_declaration=None, pretty_print=False, with_tail=True,
standalone=None, doctype=None, exclusive=False, with_comments=True)

Run Code Online (Sandbox Code Playgroud)

将元素序列化为其XML树的编码字符串表示形式.

对我来说,每一个参数tostring()似乎都没有帮助.有任何建议或更正吗？

Answer 1

Mic*_*lon 4

我经常抓住一个命名空间来为它创建一个别名,如下所示:

someXML = lxml.etree.XML(someString)
if ns is None:
      ns = {"m": someXML.tag.split("}")[0][1:]}
someid = someXML.xpath('.//m:ImportantThing//m:ID', namespaces=ns)

Run Code Online (Sandbox Code Playgroud)

您可以执行类似的操作来获取命名空间,以便生成一个正则表达式,以便在使用后清理它tostring.

或者你可以清理输入字符串.找到第一个空格,检查它是否后跟xmlns,如果是,则删除整个xmlns位直到下一个空格,如果没有删除空格.重复,直到没有空格或xmlns声明.但是不要超过第一个>.

归档时间：	14 年，6 月前
查看次数：	8719 次
最近记录：	9 年，1 月前