使用Python解析XML

oax*_*att 2 python xml

我有几个大的.xml文件.我想解析文件来做几件事.

我只想退出:

  • XML-/title1并将其保存到列表A(例如)
  • XML-/title2并将其保存到列表B.
  • XML-/title3并将其保存到列表C.
  • 等等

使用Python 2.x哪个库最适合导入/使用.我该如何设置?有什么建议?

例如:

 <PubmedArticle>
    <MedlineCitation Owner="NLM" Status="MEDLINE">
        <PMID Version="1">8981971</PMID>
        <Article PubModel="Print">
            <Journal>
                <ISSN IssnType="Print">0002-9297</ISSN>
                <JournalIssue CitedMedium="Print">
                    <Volume>60</Volume>
                    <Issue>1</Issue>
                    <PubDate>
                        <Year>1997</Year>
                        <Month>Jan</Month>
                    </PubDate>
                </JournalIssue>
                <Title>American journal of human genetics</Title>
                <ISOAbbreviation>Am. J. Hum. Genet.</ISOAbbreviation>
            </Journal>
            <ArticleTitle>mtDNA and Y chromosome-specific polymorphisms in modern Ojibwa: implications about the origin of their gene pool.</ArticleTitle>
            <Pagination>
                <MedlinePgn>241-4</MedlinePgn>
            </Pagination>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Scozzari</LastName>
                    <ForeName>R</ForeName>
                    <Initials>R</Initials>
                </Author>
            </AuthorList>
        <MeshHeadingList>
            <MeshHeading>
                <DescriptorName MajorTopicYN="N">Alleles</DescriptorName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName MajorTopicYN="Y">Y Chromosome</DescriptorName>
            </MeshHeading>
        </MeshHeadingList>
        <OtherID Source="NLM">PMC1712541</OtherID>
    </MedlineCitation>
</PubmedArticle>
Run Code Online (Sandbox Code Playgroud)

awe*_*eis 5

试试看lxml模块.

要找到标题,可以使用带有lxml的Xpath,或者可以使用lxml中的xml对象结构将"索引"到标题元素.