Dav*_*ltz 7 python unicode punctuation
Here's the problem, I have a unicode string as input to a python sqlite query. The query failed ('like'). It turns out the string, 'FRANCE' doesn't have 6 characters, it has seven. And the seventh is . . . unicode U+FEFF, a zero-width no-break space.
How on earth do I trap a class of such things before the query?
小智 11
您可以将unicodedata类别用作Python中unicode数据表的一部分:
>>> unicodedata.category(u'a')
'Ll'
>>> unicodedata.category(u'.')
'Po'
>>> unicodedata.category(u',')
'Po'
Run Code Online (Sandbox Code Playgroud)
正如您所见,标点符号的类别以"P"开头.所以你需要通过char过滤掉char(使用列表推导).
也可以看看:
在你的情况下:
>>> unicodedata.category(u'\ufeff')
'Cf'
Run Code Online (Sandbox Code Playgroud)
因此,您可以根据字符的类别执行一些白名单.