使用html2text并在Python中清理一些文本

Question

使用html2text并在Python中清理一些文本

Red*_*vet 2 python string selenium python-2.7

我正在使用Html2Text将HTML代码转换为文本.效果很好,但我在互联网上找不到很多例子或文档.

我正在以这种方式读取用户名:

text_to_gain = hxs.xpath('//div[contains(@id,"yq-question-detail-profile-img")]/a/img/@alt').extract()
if text_to_gain:
        h = html2text.HTML2Text()
        h.ignore_links = True
        item['author'] = h.handle(text_to_gain[0])
else:
        item['author'] = "anonymous"

Run Code Online (Sandbox Code Playgroud)

但我的输出是这样的:

u'Duncan\n\n'

Run Code Online (Sandbox Code Playgroud)

当我读取长文本或消息时,它是有用的,但是对于单个字符串或某个字符串,我只想保留名称.

'Duncan'

Run Code Online (Sandbox Code Playgroud)

Answer 1

JRo*_*ite 5

使用strip()功能.这将删除所有空格.

>>> a = u'Duncan\n\n'
>>> a
u'Duncan\n\n'
>>> a.strip()
u'Duncan'
>>> str(a.strip())
'Duncan'

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，6 月前
查看次数：	430 次
最近记录：	10 年，6 月前