相关疑难解决方法(0)

BeautifulSoup 不给我 Unicode

我正在使用美丽的汤来抓取数据。BS 文档指出 BS 应始终返回 Unicode，但我似乎无法获得 Unicode。这是一个代码片段

import urllib2
from libs.BeautifulSoup import BeautifulSoup

# Fetch and parse the data
url = 'http://wiki.gnhlug.org/twiki2/bin/view/Www/PastEvents2007?skin=print.pattern'

data = urllib2.urlopen(url).read()
print 'Encoding of fetched HTML : %s', type(data)

soup = BeautifulSoup(data)
print 'Encoding of souped up HTML : %s', soup.originalEncoding 

table = soup.table
print type(table.renderContents())

Run Code Online (Sandbox Code Playgroud)

从页面返回的原始数据是一个字符串。BS 将原始编码显示为 ISO-8859-1。我认为 BS 会自动将所有内容转换为 Unicode，那么为什么当我这样做时：

table = soup.table
print type(table.renderContents())

Run Code Online (Sandbox Code Playgroud)

..它给了我一个字符串对象而不是Unicode？

如何从 BS 获取 Unicode 对象？

我真的，真的很迷茫。有什么帮助吗？提前致谢。

python unicode beautifulsoup character-encoding

Mri*_*lla

2010 07-07

4
推荐指数

1
解决办法

4093
查看次数

如何在BeautifulSoup中以unicode呈现标记的内容？

这是来自WordPress帖子详细信息页面的汤:

content = soup.body.find('div', id=re.compile('post'))
title = content.h2.extract()
item['title'] = unicode(title.string)
item['content'] = u''.join(map(unicode, content.contents))

Run Code Online (Sandbox Code Playgroud)

我想div在分配时省略封闭标记item['content'].有没有办法在unicode中呈现标签的所有子标签？就像是:

item['content'] = content.contents.__unicode__()

Run Code Online (Sandbox Code Playgroud)

这将给我一个unicode字符串而不是列表.

python xml screen-scraping web-applications beautifulsoup

muh*_*huk

lucky-day

2
推荐指数

1
解决办法

3481
查看次数

标签统计

beautifulsoup ×2

python ×2

character-encoding ×1

screen-scraping ×1

unicode ×1

web-applications ×1

xml ×1

BeautifulSoup 不给我 Unicode

如何在BeautifulSoup中以unicode呈现标记的内容？

标签 统计

标签统计