Mcl*_*000 1 python optimization performance numpy
我试图找到一种方法来摆脱这个 while 循环,因为它的时间成本很高。我在这里做的是索引列表(数据),然后找到 [x:x+9] 之间的最高值,然后将其添加到另一个数组(结果),然后将 1 添加到 x 以索引整个列表. 这是一种愚蠢的做法吗?有没有更快更聪明的方法?任何帮助深表感谢。我希望我已经很好地解释了这一点。
def calc(data):
result = np.zeros(len(data)) # allocating space
x = 0
while x < len(a):
highest_value = max(data[x:x+9])
print(f'{data[x:x+9]} highest value = {highest_value}')
result[x] = b
print(result)
x += 1
return result
data = [7,6,5,4,3,4,2,3,4,5,6,7,8,9,3,5,4,2,3,1]
result = calc(data)
Run Code Online (Sandbox Code Playgroud)
出去:
[7, 6, 5, 4, 3, 4, 2, 3, 4] highest value = 7
[7. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[6, 5, 4, 3, 4, 2, 3, 4, 5] highest value = 6
[7. 6. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[5, 4, 3, 4, 2, 3, 4, 5, 6] highest value = 6
[7. 6. 6. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[4, 3, 4, 2, 3, 4, 5, 6, 7] highest value = 7
[7. 6. 6. 7. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[3, 4, 2, 3, 4, 5, 6, 7, 8] highest value = 8
[7. 6. 6. 7. 8. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[4, 2, 3, 4, 5, 6, 7, 8, 9] highest value = 9
[7. 6. 6. 7. 8. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[2, 3, 4, 5, 6, 7, 8, 9, 3] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[3, 4, 5, 6, 7, 8, 9, 3, 5] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[4, 5, 6, 7, 8, 9, 3, 5, 4] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[5, 6, 7, 8, 9, 3, 5, 4, 2] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[6, 7, 8, 9, 3, 5, 4, 2, 3] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7, 8, 9, 3, 5, 4, 2, 3, 1] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0.]
[8, 9, 3, 5, 4, 2, 3, 1] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0.]
[9, 3, 5, 4, 2, 3, 1] highest value = 9
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0.]
[3, 5, 4, 2, 3, 1] highest value = 5
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 0. 0. 0. 0. 0.]
[5, 4, 2, 3, 1] highest value = 5
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 0. 0. 0. 0.]
[4, 2, 3, 1] highest value = 4
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 0. 0. 0.]
[2, 3, 1] highest value = 3
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 3. 0. 0.]
[3, 1] highest value = 3
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 3. 3. 0.]
[1] highest value = 1
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 3. 3. 1.]
______________________________________________________________
result:
[7. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 0. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 0. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 0. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 0. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 3. 0. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 3. 3. 0.]
[7. 6. 6. 7. 8. 9. 9. 9. 9. 9. 9. 9. 9. 9. 5. 5. 4. 3. 3. 1.]
Run Code Online (Sandbox Code Playgroud)
挂墙时间:7.01 毫秒
您可以通过为 numpy 提供每个子范围的索引列表来让 numpy 并行执行计算:
例如:
import numpy as np
data = np.array([7,6,5,4,3,4,2,3,4,5,6,7,8,9,3,5,4,2,3,1])
idx = np.arange(9)+np.arange(len(data))[:,None] # indexes of subRanges
idx = np.minimum(len(data)-1,idx) # don't overflow indexes
rollingMax = np.max(data[idx],axis=1) # apply maximums on every subrange
print(rollingMax)
[7 6 6 7 8 9 9 9 9 9 9 9 9 9 5 5 4 3 3 1]
Run Code Online (Sandbox Code Playgroud)
[编辑] 一种更快的方法是遍历值偏移而不是位置。虽然这仍然涉及一个循环,但它要快得多,并且可以在更大的数据集上保持速度改进。
def rollingMax2(data,window=9):
result = data.copy()
for offset in range(1,window):
result[:-1] = np.maximum(result[:-1],result[1:])
return result
speed improvement
number of values rollingMax rollingMax2
20 3x 3x
200 15x 31x
2,000 26x 99x
20,000 35x 167x
200,000 21x 220x
2,000,000 11x 66x
20,000,000 11x 35x
Run Code Online (Sandbox Code Playgroud)