非ASCII Python标识符和反射率

iag*_*ito 8 python reflection unicode identifier variable-names

我从PEP 3131中了解到Python中支持非ASCII标识符,尽管它不被认为是最佳实践.

但是,我得到了这种奇怪的行为,我的 identifier (U+1D70F) seems to be automatically converted to ?(U + 03C4).

class Base(object):
    def __init__(self):
        self. = 5 # defined with U+1D70F

a = Base()
print(a.)     # 5             # (U+1D70F)
print(a.?)     # 5 as well     # (U+03C4) ? another way to access it?
d = a.__dict__ # {'?':  5}     # (U+03C4) ? seems converted
print(d['?'])  # 5             # (U+03C4) ? consistent with the conversion
print(d[''])  # KeyError: '' # (U+1D70F) ?! unexpected!
Run Code Online (Sandbox Code Playgroud)

这是预期的行为吗?为什么会发生这种静默转换?NFKC正常化有什么可看的吗?我认为这只是为了规范地排序Unicode字符序列 ......

jon*_*rpe 11

根据标识符的文档:

解析时,所有标识符都转换为正常格式NFKC; 标识符的比较基于NFKC.

您可以看到U + 03C4是使用的适当结果unicodedata:

>>> import unicodedata
>>> unicodedata.normalize('NFKC', '')
'?'
Run Code Online (Sandbox Code Playgroud)

然而,这种转换并不适用于字符串文字,就像你正在使用的字典关键之一,因此它寻找一个字典转换的字符只包含转换的字符.

self. = 5  # implicitly converted to "self.? = 5"
a.  # implicitly converted to "a.?"
d['']  # not converted
Run Code Online (Sandbox Code Playgroud)

您可以看到类似的问题,例如使用的字符串文字getattr:

>>> getattr(a, '')
Traceback (most recent call last):
  File "python", line 1, in <module>
AttributeError: 'Base' object has no attribute ''
>>> getattr(a, unicodedata.normalize('NFKD', ''))
5
Run Code Online (Sandbox Code Playgroud)