在Cython中使用字典,尤其是在nogil中

Sej*_*air 11 python cython gil

我有一本字典,

my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2])
Run Code Online (Sandbox Code Playgroud)

我想在Cython nogil函数中使用这个字典.所以,我试图宣布它为

cdef dict cy_dict = my_dict 
Run Code Online (Sandbox Code Playgroud)

到目前为止这个阶段很好.

现在我需要迭代my_dict的键,如果值在列表中,则迭代它.在Python中,它很容易如下:

 for key in my_dict:
      if isinstance(my_dict[key], (list, tuple)):
          ###### Iterate over the value of the list or tuple
          for value in list:
               ## Do some over operation.
Run Code Online (Sandbox Code Playgroud)

但是,在Cython中,我想在nogil中实现相同的功能.因为,在nogil中不允许python对象,我都被困在这里.

with nogil:
    #### same implementation of the same in Cython
Run Code Online (Sandbox Code Playgroud)

有人可以帮帮我吗?

Dav*_*idW 25

不幸的是,唯一真正合理的选择是接受你需要GIL.有一个不太明智的选择也涉及C++地图,但它可能很难适用于您的具体情况.

您可以使用with gil:重新获取GIL.这里有明显的开销(使用GIL的部分不能并行执行,并且可能存在等待GIL的延迟).但是,如果字典操作是较大的一段Cython代码的一小部分,这可能不会太糟糕:

with nogil:
  # some large chunk of computationally intensive code goes here
  with gil:
    # your dictionary code
  # more computationally intensive stuff here
Run Code Online (Sandbox Code Playgroud)

另一个不太明智的选择是使用C++映射(与其他C++标准库数据类型一起使用).Cython可以包装这些并自动转换它们.根据您的示例数据给出一个简单的示例:

from libcpp.map cimport map
from libcpp.string cimport string
from libcpp.vector cimport vector
from cython.operator cimport dereference, preincrement

def f():
    my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2]}
    # the following conversion has an computational cost to it 
    # and must be done with the GIL. Depending on your design
    # you might be able to ensure it's only done once so that the
    # cost doesn't matter much
    cdef map[string,vector[int]] m = my_dict

    # cdef statements can't go inside no gil, but much of the work can
    cdef map[string,vector[int]].iterator end = m.end()
    cdef map[string,vector[int]].iterator it = m.begin()

    cdef int total_length = 0

    with nogil: # all  this stuff can now go inside nogil   
        while it != end:
            total_length += dereference(it).second.size()
            preincrement(it)

    print total_length
Run Code Online (Sandbox Code Playgroud)

(你需要编译它language='c++').

这样做的明显缺点是必须事先知道dict中的数据类型(它不能是任意的Python对象).但是,由于您无法在nogil块内操纵任意Python对象,因此您仍然受到很大限制.