使用python/django从字符串中删除非ASCII字符

Question

使用python/django从字符串中删除非ASCII字符

Gau*_*rma 16 python regex django unicode replace

我有一个存储在数据库中的HTML字符串.不幸的是它包含诸如®之类的字符我希望用它们的HTML等效替换这些字符,无论是在DB本身还是在我的Python/Django代码中使用Find Replace.

有关如何做到这一点的任何建议？

Answer 1

您可以使用ASCII字符是前128个字符,因此请获取每个字符的编号,ord如果超出范围则将其删除

# -*- coding: utf-8 -*-

def strip_non_ascii(string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)


test = u'éáé123456tgreáé@€'
print test
print strip_non_ascii(test)

Run Code Online (Sandbox Code Playgroud)

结果