You*_*won 6 python xml xpath parsing beautifulsoup
我想用Python处理特定标签之间的.tcx文件(xml格式)数据.
文件格式如下.
<Track>
<Trackpoint>
<Time>2015-08-29T22:04:39.000Z</Time>
<Position>
<LatitudeDegrees>37.198049426078796</LatitudeDegrees>
<LongitudeDegrees>127.07204628735781</LongitudeDegrees>
</Position>
<AltitudeMeters>34.79999923706055</AltitudeMeters>
<DistanceMeters>7.309999942779541</DistanceMeters>
<HeartRateBpm>
<Value>102</Value>
</HeartRateBpm>
<Cadence>76</Cadence>
<Extensions>
<TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
<Watts>112</Watts>
</TPX>
</Extensions>
</Trackpoint>
....Lots of <Trackpoint> ... </Trackpoint>
</Track>
Run Code Online (Sandbox Code Playgroud)
最后,我将使用"Lattitude,Altitude,... Watts"列创建数据表.
首先,我尝试使用BeautifulSoup,xpath等从托管数据(如Watts .../Watts)制作一个列表.但我是一个处理这些工具的新手.如何使用Python在xml文件中的标签之间获取数据?
您可以使用该lxml模块以及XPath. lxml适合解析 XML/HTML、遍历元素树和返回元素文本/属性。您可以使用 选择特定元素、元素集或元素属性XPath。使用您的示例数据:
content = '''
<Track>
<Trackpoint>
<Time>2015-08-29T22:04:39.000Z</Time>
<Position>
<LatitudeDegrees>37.198049426078796</LatitudeDegrees>
<LongitudeDegrees>127.07204628735781</LongitudeDegrees>
</Position>
<AltitudeMeters>34.79999923706055</AltitudeMeters>
<DistanceMeters>7.309999942779541</DistanceMeters>
<HeartRateBpm>
<Value>102</Value>
</HeartRateBpm>
<Cadence>76</Cadence>
<Extensions>
<TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
<Watts>112</Watts>
</TPX>
</Extensions>
</Trackpoint>
....Lots of <Trackpoint> ... </Trackpoint>
</Track>
'''
from lxml import etree
tree = etree.XML(content)
time = tree.xpath('Trackpoint/Time/text()')
print(time)
Run Code Online (Sandbox Code Playgroud)
输出
['2015-08-29T22:04:39.000Z']
Run Code Online (Sandbox Code Playgroud)