当变量超出范围时，Python 不会删除变量

Question

当变量超出范围时，Python 不会删除变量

考虑以下代码：

import random                                                                   

class Trie:                                                                     
    def __init__(self, children, end):                                          
        self.children = children                                                
        self.end = end                                                          

def trie_empty():                                                               
    return Trie(dict(), False)                                                  

def trie_insert(x, t):                                                          
    if not x:                                                                   
        t.end = True                                                            
        return                                                                  
    try:                                                                        
        t2 = t.children[x[0]]                                                   
    except KeyError:                                                            
        t2 = trie_empty()                                                       
        t.children[x[0]] = t2                                                     
    trie_insert(x[1:], t2)                                                      

def fill_dict(root):                                                            
    memo = dict()                                                               
    def fill(pfx='', depth=0):                                                  
        try:                                                                    
            memo[pfx]                                                           
        except KeyError:                                                        
            pass                                                                
        else:                                                                   
            return                                                              
        if depth > 6:                                                           
            return                                                              
        for ci in range(ord('a'), ord('d') + 1):                                
            fill(pfx + chr(ci), depth + 1)                                      
        bw = None                                                               
        memo[pfx] = None, bw                                                    
    fill()                                                                      
    # del memo                                                                  

def random_word():                                                              
    l = int(random.random() * 10)                                               
    w = ''.join([chr(int(random.random() * 26) + ord('a')) for _ in range(l)])  
    return w                                                                    

def main():                                                                     
    t = trie_empty()                                                            
    for _ in range(10000):                                                      
        trie_insert(random_word(), t)                                           

    while True:                                                                 
        fill_dict(t)                                                            

if __name__ == '__main__':                                                      
    main()

Run Code Online (Sandbox Code Playgroud)

当我运行它时，它会继续使用更多内存，直到我杀死它。如果我取消注释del memo，它会在使用恒定内存量时运行。由此，我得出结论，局部变量memo在fill_dict返回时没有被清除。

这种行为对我来说真的很神秘，特别是因为基本上所有上述代码都是看到这种行为所必需的。程序使用无限内存时，即使是完全未使用的参数fill_dict也不能省略。

这真的很令人沮丧。当然，现代的垃圾收集语言可以清理自己的变量，我不应该手动删除函数局部变量。当函数返回时，甚至 C 也可以清理堆栈。为什么 Python 不能（在这种情况下）？

Answer 1

tor*_*rek 6

我认为这个问题值得回答，现在我和程序员之间——并且match 在评论中提到了相同的起点——我们已经弄清楚了。

模块级函数fill_dict有一个内部函数fill：

def fill_dict(root):                                                            
    memo = dict()                                                               
    def fill(pfx='', depth=0):

Run Code Online (Sandbox Code Playgroud)

这个内部名称fill绑定到通过编译其内容创建的实体。该实体引用回memo在条目 to 处绑定到一个新的空字典的名称fill_dict，因此该实体本身就是一个闭包。

现在，闭包可以被垃圾收集，而且 Python 确实有一个垃圾收集器。但是 CPython 尤其有一个两层收集器：有一种主要的、永远在线的、基于引用计数的收集器，然后是一个运行频率低得多的真正的标记和清除风格的 GC。（见什么时候CPython的垃圾回收？以及为什么蟒蛇同时使用引用计数和标记和清除的GC？）

边栏：引用计数收集器有什么问题？

引用计数收集器被循环击败：

>>> x = []
>>> x.append(x)
>>> x
[[...]]

Run Code Online (Sandbox Code Playgroud)

Herex绑定到一个列表，该列表的第一个元素是绑定到的列表x。也就是说，x[0] 是 x，而 x[0][0] 是 x，依此类推：

>>> x[0] is x
True
>>> x[0][0] is x
True

Run Code Online (Sandbox Code Playgroud)

对于这种循环，删除x无济于事，因为列表指的是自身。但是，我们可以创建一个更漂亮的循环：

>>> a = dict()
>>> b = dict()
>>> a['link-to-b'] = b
>>> b['link-to-a'] = a
>>> a
{'link-to-b': {'link-to-a': {...}}}
>>> b
{'link-to-a': {'link-to-b': {...}}}

Run Code Online (Sandbox Code Playgroud)

现在，如果我们关闭其中一个链接，循环就会消失：

>>> a['link-to-b'] = None
>>> a
{'link-to-b': None}
>>> b
{'link-to-a': {'link-to-b': None}}

Run Code Online (Sandbox Code Playgroud)

一切都会好起来的。

回到手头的问题

在这种特殊情况下，在其外部fill引用了memo实例fill_dict，其中的条目之一 memo是：

        memo[pfx] = None, bw

Run Code Online (Sandbox Code Playgroud)

变量bw本身是在闭包内部定义的，所以memo[pfx]指的是闭包（或者更准确地说，指的是闭包内的实体），而闭包指的是memo，这就是我们的循环引用。

因此，即使fill_dict返回时，闭包的引用计数也没有下降到零。

归档时间：	6 年，11 月前
查看次数：	984 次
最近记录：	6 年，11 月前