如何加速Numpy中的转换矩阵创建？

Question

如何加速Numpy中的转换矩阵创建？

以下是我所知道的计算马尔可夫链中的转换并使用它来填充转换矩阵的最基本方法:

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

Run Code Online (Sandbox Code Playgroud)

我尝试过3种不同的加速方式:

1)使用基于此Matlab代码的稀疏矩阵单行程:

transition_matrix = full(sparse(markov_chain(1:end-1), markov_chain(2:end), 1))

Run Code Online (Sandbox Code Playgroud)

在Numpy/SciPy中,它看起来像这样:

def get_sparse_counts_matrix(markov_chain, number_of_states):
    return coo_matrix(([1]*(len(markov_chain) - 1), (markov_chain[0:-1], markov_chain[1:])), shape=(number_of_states, number_of_states))

Run Code Online (Sandbox Code Playgroud)

我尝试了几个Python调整,比如使用zip():

for old_state, new_state in zip(markov_chain[0:-1], markov_chain[1:]):
    transition_counts_matrix[old_state, new_state] += 1

Run Code Online (Sandbox Code Playgroud)

和队列:

old_and_new_states_holder = Queue(maxsize=2)
old_and_new_states_holder.put(markov_chain[0])
for new_state in markov_chain[1:]:
    old_and_new_states_holder.put(new_state)
    old_state = old_and_new_states_holder.get()
    transition_counts_matrix[old_state, new_state] += 1

Run Code Online (Sandbox Code Playgroud)

但是这三种方法都没有加速.实际上,除了zip()解决方案之外的所有内容都比我原来的解决方案慢了至少10倍.

还有其他值得研究的解决方案吗？

用于从许多链构建转换矩阵的改进解决方案
上述问题的最佳答案是DSM.但是,对于任何想要根据数百万马尔可夫链列表填充转换矩阵的人来说,最快的方法是:

def fast_increment_transition_counts_from_chain(markov_chain, transition_counts_matrix):
    flat_coords = numpy.ravel_multi_index((markov_chain[:-1], markov_chain[1:]), transition_counts_matrix.shape)
    transition_counts_matrix.flat += numpy.bincount(flat_coords, minlength=transition_counts_matrix.size)

def get_fake_transitions(markov_chains):
    fake_transitions = []
    for i in xrange(1,len(markov_chains)):
        old_chain = markov_chains[i - 1]
        new_chain = markov_chains[i]
        end_of_old = old_chain[-1]
        beginning_of_new = new_chain[0]
        fake_transitions.append((end_of_old, beginning_of_new))
    return fake_transitions

def decrement_fake_transitions(fake_transitions, counts_matrix):
    for old_state, new_state in fake_transitions:
        counts_matrix[old_state, new_state] -= 1

def fast_get_transition_counts_matrix(markov_chains, number_of_states):
    """50% faster than original, but must store 2 additional slice copies of all markov chains in memory at once.
    You might need to break up the chains into manageable chunks that don't exceed your memory.
    """
    transition_counts_matrix = numpy.zeros([number_of_states, number_of_states])
    fake_transitions = get_fake_transitions(markov_chains)
    markov_chains = list(itertools.chain(*markov_chains))
    fast_increment_transition_counts_from_chain(markov_chains, transition_counts_matrix)
    decrement_fake_transitions(fake_transitions, transition_counts_matrix)
    return transition_counts_matrix

Run Code Online (Sandbox Code Playgroud)

Answer 1

Jos*_*del 8

只是为了踢,因为我一直想尝试一下,我将Numba应用于你的问题.在代码中,这只涉及添加装饰器(虽然我已经直接调用了所以我可以测试numba在这里提供的jit变体):

import numpy as np
import numba

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

autojit_func = numba.autojit()(increment_counts_in_matrix_from_chain)
jit_func = numba.jit(argtypes=[numba.int64[:,::1],numba.double[:,::1]])(increment_counts_in_matrix_from_chain)

t = np.random.randint(0,50, 500)
m1 = np.zeros((50,50))
m2 = np.zeros((50,50))
m3 = np.zeros((50,50))

Run Code Online (Sandbox Code Playgroud)

然后时间:

In [10]: %timeit increment_counts_in_matrix_from_chain(t,m1)
100 loops, best of 3: 2.38 ms per loop

In [11]: %timeit autojit_func(t,m2)                         

10000 loops, best of 3: 67.5 us per loop

In [12]: %timeit jit_func(t,m3)
100000 loops, best of 3: 4.93 us per loop

Run Code Online (Sandbox Code Playgroud)

该autojit方法基于运行时输入进行一些猜测,并且该jit函数具有指定的类型.你必须要小心一点,因为在这些早期阶段的numba不会传达jit如果输入错误的类型输入错误.它会吐出一个错误的答案.

尽管如此,在没有任何代码更改的情况下获得35x和485x的加速并且只是添加对numba的调用(也可以称为装饰器)在我的书中非常令人印象深刻.使用cython可能会得到类似的结果,但它需要更多的样板并编写一个setup.py文件.

我也喜欢这个解决方案,因为代码仍然可读,您可以按照最初考虑实现算法的方式编写代码.

Answer 2

DSM*_*DSM 6

这样的事情怎么样,利用np.bincount？不是超级健壮,但功能齐全.[感谢@Warren Weckesser的设置.]

import numpy as np
from collections import Counter

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

def using_counter(chain, counts_matrix):
    counts = Counter(zip(chain[:-1], chain[1:]))
    from_, to = zip(*counts.keys())
    counts_matrix[from_, to] = counts.values()

def using_bincount(chain, counts_matrix):
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape)
    counts_matrix.flat = np.bincount(flat_coords, minlength=counts_matrix.size)

def using_bincount_reshape(chain, counts_matrix):
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape)
    return np.bincount(flat_coords, minlength=counts_matrix.size).reshape(counts_matrix.shape)

Run Code Online (Sandbox Code Playgroud)

这使:

In [373]: t = np.random.randint(0,50, 500)
In [374]: m1 = np.zeros((50,50))
In [375]: m2 = m1.copy()
In [376]: m3 = m1.copy()

In [377]: timeit increment_counts_in_matrix_from_chain(t, m1)
100 loops, best of 3: 2.79 ms per loop

In [378]: timeit using_counter(t, m2)
1000 loops, best of 3: 924 us per loop

In [379]: timeit using_bincount(t, m3)
10000 loops, best of 3: 57.1 us per loop

Run Code Online (Sandbox Code Playgroud)

[编辑]

避免flat(以非就地工作为代价)可以为小型矩阵节省一些时间:

In [80]: timeit using_bincount_reshape(t, m3)
10000 loops, best of 3: 22.3 us per loop

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，2 月前
查看次数：	2440 次
最近记录：	13 年，2 月前