Cython化Python函数以使其更快

Cur*_*arn 12 python performance cython

几周前,我问了一个关于提高用Python编写的函数速度的问题.那时,TryPyPy引起了我注意使用Cython这样做的可能性.他还举了一个例子,说明我如何Cythonize该代码片段.我想对下面的代码做同样的事情,看看通过声明变量类型我能做多快.我有几个与此相关的问题.我在cython.org上看过教程,但我还是有一些问题.它们密切相关:

  1. 我不知道C.我需要学习哪些部分,使用Cython来声明变量类型?
  2. 与python列表和元组对应的C类型是什么?例如,我可以在Python中使用doubleCython float.我该怎么做列表?通常,我在哪里可以找到给定Python类型的相应C类型.

我如何对下面的代码进行Cython化的任何例子都会非常有用.我在代码中插入了注释,提供有关变量类型的信息.

class Some_class(object):
    ** Other attributes and functions **
    def update_awareness_status(self, this_var, timePd):
        '''Inputs: this_var (type: float)
           timePd (type: int)
           Output: None'''

        max_number = len(self.possibilities)
        # self.possibilities is a list of tuples.
        # Each tuple is a pair of person objects. 

        k = int(math.ceil(0.3 * max_number))
        actual_number = random.choice(range(k))
        chosen_possibilities = random.sample(self.possibilities, 
                                         actual_number)
        if len(chosen_possibilities) > 0:
            # chosen_possibilities is a list of tuples, each tuple is a pair
            # of person objects. I have included the code for the Person class
            # below.
            for p1,p2 in chosen_possibilities:

                # awareness_status is a tuple (float, int)
                if p1.awareness_status[1] < p2.awareness_status[1]:                   
                    if p1.value > p2.awareness_status[0]:
                        p1.awareness_status = (this_var, timePd)
                    else:
                        p1.awareness_status = p2.awareness_status
                elif p1.awareness_status[1] > p2.awareness_status[1]:
                    if p2.value > p1.awareness_status[0]:
                        p2.awareness_status = (price, timePd)
                    else:
                        p2.awareness_status = p1.awareness_status
                else:
                    pass     

class Person(object):                                         
    def __init__(self,id, value):
        self.value = value
        self.id = id
        self.max_val = 50000
        ## Initial awareness status.          
        self.awarenessStatus = (self.max_val, -1)
Run Code Online (Sandbox Code Playgroud)

lot*_*rio 7

总的来说,通过运行cython带有-a"annotate"选项的命令,您可以准确地看到Cython为每个源代码生成的C代码.有关示例,请参阅Cython 文档.在尝试查找函数体内的瓶颈时,这非常有用.

此外,在Cython编写代码时,还有"早期绑定速度"的概念.Python对象(类似于Person下面类的实例)使用通用Python代码进行属性访问,这在内循环中很慢.我怀疑如果你把Person班级改为a cdef class,那么你会看到一些加速.此外,您需要在内部循环中键入p1p2对象.

由于你的代码有很多Python调用(random.sample例如),你可能不会获得巨大的加速,除非你找到一种方法将这些行放入C,这需要花费很多精力.

您可以将事物键入为a tuple或a list,但它通常不会意味着加速.最好尽可能使用C数组; 你必须要查找的东西.

通过以下微不足道的修改,我获得了1.6倍的加速因子.请注意,我必须在这里和那里更改一些东西以使其编译.

ctypedef int ITYPE_t

cdef class CyPerson:
    # These attributes are placed in the extension type's C-struct, so C-level
    # access is _much_ faster.
    cdef ITYPE_t value, id, max_val
    cdef tuple awareness_status

    def __init__(self, ITYPE_t id, ITYPE_t value):
        # The __init__ function is much the same as before.
        self.value = value
        self.id = id
        self.max_val = 50000
        ## Initial awareness status.          
        self.awareness_status = (self.max_val, -1)

NPERSONS = 10000

import math
import random

class Some_class(object):

    def __init__(self):
        ri = lambda: random.randint(0, 10)
        self.possibilities = [(CyPerson(ri(), ri()), CyPerson(ri(), ri())) for i in range(NPERSONS)]

    def update_awareness_status(self, this_var, timePd):
        '''Inputs: this_var (type: float)
           timePd (type: int)
           Output: None'''

        cdef CyPerson p1, p2
        price = 10

        max_number = len(self.possibilities)
        # self.possibilities is a list of tuples.
        # Each tuple is a pair of person objects. 

        k = int(math.ceil(0.3 * max_number))
        actual_number = random.choice(range(k))
        chosen_possibilities = random.sample(self.possibilities,
                                         actual_number)
        if len(chosen_possibilities) > 0:
            # chosen_possibilities is a list of tuples, each tuple is a pair
            # of person objects. I have included the code for the Person class
            # below.
            for persons in chosen_possibilities:
                p1, p2 = persons
                # awareness_status is a tuple (float, int)
                if p1.awareness_status[1] < p2.awareness_status[1]:
                    if p1.value > p2.awareness_status[0]:
                        p1.awareness_status = (this_var, timePd)
                    else:
                        p1.awareness_status = p2.awareness_status
                elif p1.awareness_status[1] > p2.awareness_status[1]:
                    if p2.value > p1.awareness_status[0]:
                        p2.awareness_status = (price, timePd)
                    else:
                        p2.awareness_status = p1.awareness_status
Run Code Online (Sandbox Code Playgroud)