是否有与英文字母类似的字符列表?

Pau*_*ite 30 python unicode glyph profanity

对于用Python编写的网络论坛,我正在亵渎亵渎过滤.

作为其中的一部分,我正在尝试编写一个带有单词的函数,并返回该单词的所有可能的模拟拼写,使用视觉上相似的字符代替特定字母(例如,s†å©køv€rƒ|øw).

我希望我不得不随着时间的推移扩大这个列表,以涵盖人们的创造力,但有一个列表浮动在互联网上的任何地方我可以用作起点吗?

Rob*_*ton 37

这可能比你需要的要深得多,但还不足以覆盖你的用例,但Unicode联盟必须处理对国际化域名的攻击,并提出了这个同形异义词列表(具有相同或相似的字符)渲染):

http://www.unicode.org/Public/security/latest/confusables.txt

至少可以成为一个起点.


spn*_*nzr 13

http://en.wikipedia.org/wiki/Letterlike_Symbols

它的全面性要低得多,但更容易理解.

  • 综合:包含来自许多来源的大量信息和详细信息.可理解:可以理解. (6认同)
  • 我相信全面和可理解的是独立品质吗? (2认同)

Sta*_*mes 5

我创建了一个 python 类来做到这一点,基于 Robin 的“混淆”的 unicode 链接

https://github.com/wanderingstan/Confusables

例如,“Hello”将扩展为以下一组正则表达式字符类:

[H\?\?\?\?\\\\\\\\\\\?\\\\\\?\?\?\?\?\\?\?\?\?\?] [e\?\?\?\?\\\\\\\\\\\\\?\?\?\?\?] [l\?\|\?\?\?1\?\?\\?\\\\\I\?\?\?\?\\\\\\\\\\\\?\?\?\?\\\\\\\\\\\\\\?\?\\\\\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\?\?\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?] [l\?\|\?\?\?1\?\?\\?\\\\\I\?\?\?\?\\\\\\\\\\\\?\?\?\?\\\\\\\\\\\\\\?\?\\\\\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\?\?\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?] [o\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\\\\\\\\\\\\?\?\?\?\\\\\\?\\\\\\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\?\\\\\?\ø\?\?\?\?\?\?\?\?\?\?\œ\?\?\?\?\?\?]

这个正则表达式将匹配“?1?”