对于寻找一致性的循环,大量数据需要花费大量时间.(14 +小时为0.15mln*36k行)

poP*_*lor 0 python performance for-loop python-3.x

我在python3.5中运行此代码以查找Concordance(逻辑回归).

for i in (ones2.index):
    for j in (zeros2.index):
      pairs_tested = pairs_tested+1
      if(ones2.iloc[i,1] > zeros2.iloc[j,1]):
          conc = conc+1
      elif(ones2.iloc[i,1]==zeros2.iloc[j,1]):
          ties = ties+1
      else:
          disc = disc+1

  # Calculate concordance, discordance and ties
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested)
Run Code Online (Sandbox Code Playgroud)

有在0.15mln行zeros2(熊猫据帧)和36K行ones2(熊猫数据帧).两个表都有两个变量

[i]响应者(在0中,Responder0 = 0,在ones2中为Responders1 = 1).

[ii]概率(在0中的prob0和在ones2中的prob1).

我的问题是: for循环耗时12小时,并且在询问此问题时仍在运行.需要帮忙.如何更快地执行此操作.我在带有8GB RAM的Windows 64bit机器上运行它.

Sre*_*ary 6

由于两个for循环(0.15 mil*36k),您的代码正在进行54亿次计算:

我会做这样的事情:(感谢@Leon帮助我更好地回答这个问题)

from bisect import bisect_left, bisect_right

zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
zeros2_length = len(zeros2_list)

for i in ones2.index:
    cur_disc = bisect_left(zeros2_list, ones2.iloc[i,1])
    cur_ties = bisect_right(zeros2_list, ones2.iloc[i,1]) - cur_disc
    disc += cur_disc
    ties += cur_ties
    conc += zeros2_length - cur_ties - cur_disc

pairs_tested = zeros2_length * len(ones2.index)

concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
Run Code Online (Sandbox Code Playgroud)

或者,反过来,像这样:

zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)

for i in zeros2.index:
    cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
    cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
    conc += cur_conc
    ties += cur_ties
    disc += ones2_length - cur_ties - cur_conc

# We could also achieve the above like this too:
# for i in zeros2_list:
#    cur_conc = bisect_left(ones2_list, i)
#    cur_ties = bisect_right(ones2_list, i) - cur_conc
#    conc += cur_conc
#    ties += cur_ties
#    disc += ones2_length - cur_ties - cur_conc

pairs_tested = zeros2_length * ones2_length

concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
Run Code Online (Sandbox Code Playgroud)