在Python中,为什么单独的字典字符串值通过"in"相等检查?(字符串实习)

syn*_*zer 5 python string dictionary string-interning python-3.x

我正在构建一个Python实用程序,它将涉及将整数映射到字符串,其中许多整数可能映射到相同的字符串.根据我的理解,默认情况下Python实习短字符串和大多数硬编码字符串,因此通过在表中保留字符串的"规范"版本来节省内存开销.我认为我可以通过实习字符串值从中受益,即使字符串实习更多地用于密钥散列优化.我写了一个快速测试,它检查长字符串的字符串相等性,首先只存储列表中的字符串,然后将字符串存储在字典中作为值.这种行为对我来说意外:

import sys

top = 10000

non1 = []
non2 = []
for i in range(top):
    s1 = '{:010d}'.format(i)
    s2 = '{:010d}'.format(i)
    non1.append(s1)
    non2.append(s2)

same = True
for i in range(top):
    same = same and (non1[i] is non2[i])
print("non: ", same) # prints False
del non1[:]
del non2[:]


with1 = []
with2 = []
for i in range(top):
    s1 = sys.intern('{:010d}'.format(i))
    s2 = sys.intern('{:010d}'.format(i))
    with1.append(s1)
    with2.append(s2)

same = True
for i in range(top):
    same = same and (with1[i] is with2[i])
print("with: ", same) # prints True

###############################

non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"

with_dict = {}
with_dict[1] = sys.intern("this is a long string")
with_dict[2] = sys.intern("this is another long string")
with_dict[3] = sys.intern("this is a long string")
with_dict[4] = sys.intern("this is another long string")

print("non: ",  non_dict[1] is non_dict[3] and non_dict[2] is non_dict[4]) # prints True ???
print("with: ", with_dict[1] is with_dict[3] and with_dict[2] is with_dict[4]) # prints True
Run Code Online (Sandbox Code Playgroud)

我认为非dict检查会导致"假"打印出来,但我显然是错了.有谁知道发生了什么,在我的情况下,字符串实习是否会产生任何好处?我可以有很多,很多比单值更关键,如果我合并来自多个输入的文本数据,所以我正在寻找一种方式来节省存储空间.(也许我将不得不使用数据库,但这超出了这个问题的范围.)提前谢谢!

use*_*ica 4

字节码编译器执行的优化之一与实习类似但不同,是它将在同一代码块中使用相同的对象来表示相同的常量。这里的字符串文字:

non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"
Run Code Online (Sandbox Code Playgroud)

位于同一代码块中,因此相同的字符串最终由同一字符串对象表示。