我正在尝试开发简单的Python(3.2)代码来读取XML文件,进行一些修正并将其存储回来.但是,在存储步骤中,ElementTree会添加此命名空间命名法.例如:
<ns0:trk>
<ns0:name>ACTIVE LOG</ns0:name>
<ns0:trkseg>
<ns0:trkpt lat="38.5" lon="-120.2">
<ns0:ele>6.385864</ns0:ele>
<ns0:time>2011-12-10T17:46:30Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="40.7" lon="-120.95">
<ns0:ele>5.905273</ns0:ele>
<ns0:time>2011-12-10T17:46:51Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="43.252" lon="-126.453">
<ns0:ele>7.347168</ns0:ele>
<ns0:time>2011-12-10T17:52:28Z</ns0:time>
</ns0:trkpt>
</ns0:trkseg>
</ns0:trk>
Run Code Online (Sandbox Code Playgroud)
代码段如下:
def parse_gpx_data(gpxdata, tzname=None, npoints=None, filter_window=None,
output_file_name=None):
ET = load_xml_library();
def find_trksegs_or_route(etree, ns):
trksegs=etree.findall('.//'+ns+'trkseg')
if trksegs:
return trksegs, "trkpt"
else: # try to display route if track is missing
rte=etree.findall('.//'+ns+'rte')
return rte, "rtept"
# try GPX10 namespace first
try:
element = ET.XML(gpxdata)
except ET.ParseError as v:
row, column = v.position
print ("error on …Run Code Online (Sandbox Code Playgroud) 我要做的就是读取一个本地.xml文件(将其编码为UTF-8,使其具有正确的标题,然后重新保存文件).但是,当我运行以下内容时,它会在每个XML元素中添加可怕的"ns0:"声明:
import xml.etree.ElementTree as ET
import sys, os
# note that this is the *module*'s `register_namespace()` function
# WTF THIS SHOULD WORK....
ET.register_namespace("", "http://www.w3.org/2000/svg")
tree = ET.ElementTree() # instantiate an object of *class* `ElementTree`
tree.parse('//cbweb1/inetpub/x/sitemap/sitemap_index.xml')
tree.write('//cbweb1/inetpub/x/sitemap/test.xml', encoding = 'utf-8', xml_declaration=True)
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么??
仅供参考,这是Python 2.7.x(已经尝试过3.4)
编辑:
输入:
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/something.xml</loc>
<lastmod>2014-05-01</lastmod>
</sitemap>
</sitemapindex>
Run Code Online (Sandbox Code Playgroud)
输出:
<?xml version="1.0" encoding="utf-8"?>
<ns0:sitemapindex xmlns:ns0="http://www.sitemaps.org/schemas/sitemap/0.9">
<ns0:sitemap>
<ns0:loc>http://www.example.com/something.xml</ns0:loc>
<ns0:lastmod>2014-05-01</ns0:lastmod>
</ns0:sitemap>
</ns0:sitemapindex>
Run Code Online (Sandbox Code Playgroud) 我目前正在解析XML文档(添加元素,添加属性等).所以我首先需要在处理之前解析XML.但是,lxml似乎正在删除该元素<?xml ...>.例如
from lxml import etree
tree = etree.fromstring('<?xml version="1.0" encoding="utf-8"?><dmodule>test</dmodule>', etree.XMLParser())
print etree.tostring(tree)
Run Code Online (Sandbox Code Playgroud)
会导致
<dmodule>test</dmodule>
Run Code Online (Sandbox Code Playgroud)
有谁知道为什么<?xml ...>元素被删除?我认为编码标签是有效的XML.谢谢你的时间.