如果出现,则在分隔符上拆分列表的字符串

Question

如果出现,则在分隔符上拆分列表的字符串

我从网页(在codecademy.com的项目中)获取HTML代码.提取导致文本.我分成了一个列表.

问题:某些结果包含Unicode字符,我想从它们出现的字符串中剪切出来.

['Normal String', 'Company\xc2\xae', 'againnormal', '\xc2\xb7']

Run Code Online (Sandbox Code Playgroud)

结果应如下所示:

['Normal String', 'Company', 'againnormal', '']

Run Code Online (Sandbox Code Playgroud)

或者理想情况下这样

['Normal String', 'Company', 'againnormal']

Run Code Online (Sandbox Code Playgroud)

Answer 1

sbe*_*rry 5

怎么样

>>> stuff = ['Normal String', 'Company\xc2\xae', 'againnormal', '\xc2\xb7']
>>> filter(None, [x.decode('utf8').encode('ascii', 'ignore') for x in stuff])
['Normal String', 'Company', 'againnormal']

Run Code Online (Sandbox Code Playgroud)

或者用正则表达式

>>> import re
>>> filter(None, [re.sub(r'[^\x00-\x7F]+', '', x) for x in stuff])
['Normal String', 'Company', 'againnormal']

Run Code Online (Sandbox Code Playgroud)

没有列表理解:

keep = []
for item in stuff:
    item = item.decode('utf8').encode('ascii', 'ignore')
    if item:
        keep.append(item)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，1 月前
查看次数：	26 次
最近记录：	9 年，1 月前