如何在解析网页时摆脱所有智能引号？

Question

这是我的代码：

name = namestr.decode("utf-8")

name.replace(u"\u2018", "").replace(u"\u2019", "").replace(u"\u201c","").replace(u"\u201d", "")

这似乎不起作用。我仍然在我的文本中找到&ldquo,&rdquo等。此外，此文本已使用 Beautiful Soup 进行解析。

Answer 1

用下面的代码替换最后一行：

name = name.replace(u"\u2018", "").replace(u"\u2019", "").replace(u"\u201c","").replace(u"\u201d", "")

该replace方法返回一个修改后的字符串，但它不会影响您调用它的字符串，因此您必须将返回值分配给上述变量。