Vla*_*gas 3 xml beautifulsoup python-3.x
我将推文保存在 XML 文件中:
\n\n<tweet>\n <tweetid>142389495503925248</tweetid>\n <user>ccifuentes</user>\n <content><![CDATA[Salgo de #VeoTV , que d\xc3\xada m\xc3\xa1s largoooooo...]]></content>\n <date>2011-12-02T00:47:55</date>\n <lang>es</lang>\n <sentiments>\n <polarity><value>NONE</value><type>AGREEMENT</type></polarity>\n </sentiments>\n <topics>\n <topic>otros</topic>\n </topics>\n </tweet>\nRun Code Online (Sandbox Code Playgroud)\n\n为了解析这些,我通过创建了一个 BeautifulSoup 实例
\n\nsoup = BeautifulSoup(xml, "lxml")\nRun Code Online (Sandbox Code Playgroud)\n\n其中 xml 是原始 XML 文件。为了访问一条推文,我这样做了:
\n\ntweets = soup.find_all(\'tweet\')\nfor tw in tweets:\n print(tw)\n break\nRun Code Online (Sandbox Code Playgroud)\n\n这导致
\n\n<tweet>\n<tweetid>142389495503925248</tweetid>\n<user>ccifuentes</user>\n<content></content>\n<date>2011-12-02T00:47:55</date>\n<lang>es</lang>\n<sentiments>\n<polarity><value>NONE</value><type>AGREEMENT</type></polarity>\n</sentiments>\n<topics>\n<topic>otros</topic>\n</topics>\n</tweet>\nRun Code Online (Sandbox Code Playgroud)\n\n请注意,当我打印第一条推文时,省略了 CDATA 部分。获得它对我来说很重要,我该怎么做?
\nsoup = bs4.BeautifulSoup(xml, 'xml')\nRun Code Online (Sandbox Code Playgroud)\n\n将解析器更改为xml
出去:
\n\n<content>Salgo de #VeoTV , que d\xc3\xada m\xc3\xa1s largoooooo...</content>\nRun Code Online (Sandbox Code Playgroud)\n\n或者html.parser:
soup = bs4.BeautifulSoup(xml, 'html.parser')\nRun Code Online (Sandbox Code Playgroud)\n\n出去:
\n\n<content><![CDATA[Salgo de #VeoTV , que d\xc3\xada m\xc3\xa1s largoooooo...]]></content>\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
1527 次 |
| 最近记录: |