在Python中对单个数组进行更快的双迭代

Question

在Python中对单个数组进行更快的双迭代

Rob*_*isi 7 python performance numpy python-3.x pandas

我想找到一种方法来更快地计算成对准确性，即将比较同一数组中的元素（在这种情况下，这是一个熊猫df列），计算它们之间的差异，然后比较所获得的两个结果。我想有一个数据帧DF有3列（ID文件的，Jugment代表人类的评估，它是一个int对象，PR_score表示该文件的网页级别，这是一个浮动对象），我要检查，如果他们同意对一个文档进行更好/最差的分类。

例如：

id：id1，id2，id3

判断：1，0，0

PR_分数：0.18，0.5，0.12

在这种情况下，两个分数在对id1的分类上优于对id3的分类，对id1和id2的分类不同，并且在id2和id3之间存在人为的判断力，因此我的成对准确性是：

协议 = 1

分歧 = 1

成对准确性 =同意/（同意+反对）= 1/2 = 0.5

这是我第一个解决方案的代码，其中我将df的列用作数组（这有助于减少计算时间）：

def pairwise(agree, disagree):
    return(agree/(agree+disagree))

def pairwise_computing_array(df):

    humanScores = np.array(df['Judgement'])  
    pagerankScores =  np.array(df['PR_Score']) 

    total = 0 
    agree = 0
    disagree = 0

    for i in range(len(df)-1):  
        for j in range(i+1, len(df)):
            total += 1
            human = humanScores[i] -  humanScores[j] #difference human judg
            if human != 0:
                pr = pagerankScores[i] -  pagerankScores[j]#difference pagerank score
                if pr != 0:
                    if np.sign(human) == np.sign(pr):  
                        agree += 1 #they agree in which of the two is better
                    else:
                        disagree +=1 #they do not agree in which of the two is better
                else:
                    continue;   
            else:
                continue;

    pairwise_accuracy = pairwise(agree, disagree)

    return(agree, disagree, total,  pairwise_accuracy)

Run Code Online (Sandbox Code Playgroud)

我尝试使用列表理解来获得更快的计算速度，但实际上比第一种解决方案要慢：

def pairwise_computing_list_comprehension(df):

    humanScores = np.array(df['Judgement'])  
    pagerankScores =  np.array(judgmentPR['PR_Score']) 

    sign = [np.sign(pagerankScores[i] - pagerankScores[j]) == np.sign(humanScores[i] - humanScores[j] ) 
            for i in range(len(df)) for j in range(i+1, len(df)) 
                if (np.sign(pagerankScores[i] - pagerankScores[j]) != 0 
                    and np.sign(humanScores[i] - humanScores[j])!=0)]

    agreement = sum(sign)
    disagreement = len(sign) -  agreement                             
    pairwise_accuracy = pairwise(agreement, disagreement)

    return(agreement, disagreement, pairwise_accuracy)

Run Code Online (Sandbox Code Playgroud)

我无法在我的整个数据集上运行，因为它花费了太多时间，所以我希望可以在不到1分钟的时间内计算出一些东西。

通过我的计算机对1000行的一小部分进行的计算达到了以下性能：

代码1：每个循环1.57 s±3.15 ms（平均±标准偏差，共7次运行，每个循环1次）

代码2：每个循环3.51 s±10.7毫秒（平均±标准偏差，共7次运行，每个循环1次）

Answer 1

Rob*_*isi 1

这是在合理时间内运行的代码，感谢@juanpa.arrivilillaga的建议：

\n\n

from numba import jit\n\n@jit(nopython = True)\ndef pairwise_computing(humanScores, pagerankScores):\n\n    total = 0 \n    agree = 0\n    disagree = 0\n\n    for i in range(len(humanScores)-1):  \n        for j in range(i+1, len(humanScores)):\n            total += 1\n            human = humanScores[i] -  humanScores[j] #difference human judg\n            if human != 0:\n                pr = pagerankScores[i] -  pagerankScores[j]#difference pagerank score\n                if pr != 0:\n                    if np.sign(human) == np.sign(pr):  \n                        agree += 1 #they agree in which of the two is better\n                    else:\n                        disagree +=1 #they do not agree in which of the two is better\n                else:\n                    continue   \n            else:\n                continue\n    pairwise_accuracy = agree/(agree+disagree)\n    return(agree, disagree, total,  pairwise_accuracy)\n\n

Run Code Online (Sandbox Code Playgroud)\n\n

这是我的整个数据集（58k 行）达到的时间性能：

\n\n

7.98 s \xc2\xb1 每个循环 2.78 ms（意味着 \xc2\xb1 标准偏差 7 次运行，每次 1 次循环）

\n

归档时间：	6 年，7 月前
查看次数：	199 次
最近记录：	6 年，7 月前