如何从xml或tcx文件中获取数据系列

You*_*won 6 python xml xpath parsing beautifulsoup

我想用Python处理特定标签之间的.tcx文件(xml格式)数据.
文件格式如下.

 <Track>
      <Trackpoint>
        <Time>2015-08-29T22:04:39.000Z</Time>
        <Position>
          <LatitudeDegrees>37.198049426078796</LatitudeDegrees>
          <LongitudeDegrees>127.07204628735781</LongitudeDegrees>
        </Position>
        <AltitudeMeters>34.79999923706055</AltitudeMeters>
        <DistanceMeters>7.309999942779541</DistanceMeters>
        <HeartRateBpm>
          <Value>102</Value>
        </HeartRateBpm>
        <Cadence>76</Cadence>
        <Extensions>
          <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
            <Watts>112</Watts>
          </TPX>
        </Extensions>
      </Trackpoint>
....Lots of <Trackpoint> ... </Trackpoint>
</Track>
Run Code Online (Sandbox Code Playgroud)

最后,我将使用"Lattitude,Altitude,... Watts"列创建数据表.
首先,我尝试使用BeautifulSoup,xpath等从托管数据(如Watts .../Watts)制作一个列表.但我是一个处理这些工具的新手.如何使用Python在xml文件中的标签之间获取数据?

gtl*_*ert 2

您可以使用该lxml模块以及XPath. lxml适合解析 XML/HTML、遍历元素树和返回元素文本/属性。您可以使用 选择特定元素、元素集或元素属性XPath。使用您的示例数据:

content = '''
<Track>
      <Trackpoint>
        <Time>2015-08-29T22:04:39.000Z</Time>
        <Position>
          <LatitudeDegrees>37.198049426078796</LatitudeDegrees>
          <LongitudeDegrees>127.07204628735781</LongitudeDegrees>
        </Position>
        <AltitudeMeters>34.79999923706055</AltitudeMeters>
        <DistanceMeters>7.309999942779541</DistanceMeters>
        <HeartRateBpm>
          <Value>102</Value>
        </HeartRateBpm>
        <Cadence>76</Cadence>
        <Extensions>
          <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
            <Watts>112</Watts>
          </TPX>
        </Extensions>
      </Trackpoint>
....Lots of <Trackpoint> ... </Trackpoint>
</Track>
'''

from lxml import etree

tree = etree.XML(content)
time = tree.xpath('Trackpoint/Time/text()')

print(time)
Run Code Online (Sandbox Code Playgroud)

输出

['2015-08-29T22:04:39.000Z']
Run Code Online (Sandbox Code Playgroud)