Pol*_*Pol 0 python encoding beautifulsoup utf-8
在这段代码中:
soup=BeautifulSoup(program.Description.encode('utf-8'))
name=soup.find('div',{'class':'head'})
print name.string.decode('utf-8')
Run Code Online (Sandbox Code Playgroud)
当我尝试打印或保存到数据库时发生错误.
dosnt metter我在做什么:
print name.string.encode('utf-8')
Run Code Online (Sandbox Code Playgroud)
要不就
print name.string
Traceback (most recent call last):
File "./manage.py", line 16, in <module>
execute_manager(settings)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/__init__.py", line 362, in execute_manager
utility.execute()
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/__init__.py", line 303, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/base.py", line 195, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/base.py", line 222, in execute
output = self.handle(*args, **options)
File "/usr/local/cluster/dynamic/website/video/remmedia/management/commands/remmedia.py", line 50, in handle
self.FirstTimeLoad()
File "/usr/local/cluster/dynamic/website/video/remmedia/management/commands/remmedia.py", line 115, in FirstTimeLoad
print name.string.decode('utf-8')
File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-5: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
这是repr(name.string)
u'\ u0412\u044b\u043f\u0443\u0441\u043a\u043e\u0442 27\u0434\u0435\u043a\u0430\u0431\u0440\u044f'
I don't know what you are trying to do with name.string.decode('utf-8'). As the BeautifulSoup documentation eloquently points out, "BeautifulSoup gives you Unicode, dammit". So name.string is already decoded - it is in unicode. You can encode it back to utf-8 if you want to, but you can't decode it any further.
| 归档时间: |
|
| 查看次数: |
5074 次 |
| 最近记录: |