ani*_*ani 5 python xml python-requests
我正在尝试制作桌面通知程序,为此我正在从网站上抓取新闻。当我运行程序时,出现以下错误。
news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
Run Code Online (Sandbox Code Playgroud)
我该如何解决?我对此完全陌生。我尝试寻找解决方案,但没有一个对我有用。
这是我的代码:
import requests
import xml.etree.ElementTree as ET
# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"
def loadRSS():
'''
utility function to load RSS feed
'''
# create HTTP request response object
resp = requests.get(RSS_FEED_URL)
# return response content
return resp.content
def parseXML(rss):
'''
utility function to parse XML format rss feed
'''
# create element tree root object
root = ET.fromstring(rss)
# create empty list for news items
newsitems = []
# iterate news items
for item in root.findall('./channel/item'):
news = {}
# iterate child elements of item
for child in item:
# special checking for namespace object content:media
if child.tag == '{http://search.yahoo.com/mrss/}content':
news['media'] = child.attrib['url']
else:
news[child.tag] = child.encode('utf8')
newsitems.append(news)
# return news items list
return newsitems
def topStories():
'''
main function to generate and return news items
'''
# load rss feed
rss = loadRSS()
# parse XML
newsitems = parseXML(rss)
return newsitems
Run Code Online (Sandbox Code Playgroud)
您正在尝试将 a 转换str
为bytes
,然后将这些字节存储在字典中。\n问题是您正在执行此操作的对象是\n xml.etree.ElementTree.Element
,\n而不是str
。
您可能想从该元素内部或周围获取文本,然后获取encode()
该元素。\n文档\n建议使用\n itertext()
\n方法:
\'\'.join(child.itertext())\n
Run Code Online (Sandbox Code Playgroud)\n\n这将评估为 a str
,然后您就可以了encode()
。
请注意,\ntext
和tail
属性\n可能不包含文本\n(添加了强调):
\n\n\n它们的值通常是字符串,但也可以是任何特定于应用程序的对象。
\n
如果您想使用这些属性,则必须处理None
非字符串值:
head = \'\' if child.text is None else str(child.text)\ntail = \'\' if child.text is None else str(child.text)\n# Do something with head and tail...\n
Run Code Online (Sandbox Code Playgroud)\n\n即使这还不够。\n如果text
或tail
包含bytes
一些意外\n(或完全错误)\n编码的对象,这将引发UnicodeEncodeError
.
我建议将文本保留为str
,而不对其进行编码。\n将文本编码到bytes
对象是将文本写入二进制文件、网络套接字或其他硬件之前的最后一步。
有关字节和字符之间差异的更多信息,请参阅 Ned Batchelder 的\n“实用 Unicode,或者,如何停止痛苦? ”\n(来自 PyCon US 2012 的36 分钟视频)。\n他涵盖了 Python 2和 3.
\n\n使用该child.itertext()
方法,而不是对字符串进行编码,我从以下位置获得了一个看起来合理的字典列表topStories()
:
[\n ...,\n {\'description\': \'Ayushmann Khurrana says his five-year Bollywood journey has \'\n \'been \xe2\x80\x9ca fun ride\xe2\x80\x9d; adds success is a lousy teacher while \'\n \'failure is \xe2\x80\x9cyour friend, philosopher and guide\xe2\x80\x9d.\',\n \'guid\': \'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html\',\n \'link\': \'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html\',\n \'media\': \'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG\',\n \'pubDate\': \'Mon, 26 Jun 2017 10:50:26 GMT \',\n \'title\': "I am a hardcore realist, and that\'s why I feel my journey "\n \'has been a joyride: Ayushmann...\'},\n]\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
12237 次 |
最近记录: |