使用取决于先前值的操作来向量化numpy代码

Bre*_*lla 5 python numpy vectorization

下面的代码模拟了一个系统,它可以随时采样3种不同的状态,这些状态之间的恒定转移概率由矩阵给出prob_nor.因此,每个点都trace取决于之前的状态.

n_states, n_frames = 3, 1000
state_val = np.linspace(0, 1, n_states)

prob = np.random.randint(1, 10, size=(n_states,)*2)
prob[np.diag_indices(n_states)] += 50

prob_nor = prob/prob.sum(1)[:,None] # transition probability matrix, 
                                    # row sum normalized to 1.0

state_idx = range(n_states) # states is a list of integers 0, 1, 2...
current_state = np.random.choice(state_idx)

trace = []      
sigma = 0.1     
for _ in range(n_frames):
    trace.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])
Run Code Online (Sandbox Code Playgroud)

上面代码中的循环使得它运行得非常慢,特别是当我必须建模数百万个数据点时.有没有办法矢量化/加速它?

PMe*_*nde 3

尽快卸载概率计算:

\n\n
possible_paths = np.vstack(\n    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)\n    for curr_state in range(n_states)\n)\n
Run Code Online (Sandbox Code Playgroud)\n\n

然后你可以简单地进行查找以遵循你的路径:

\n\n
path_trace = [None]*n_frames\nfor step in range(n_frames):\n    path_trace[step] = possible_paths[current_state, step]\n    current_state = possible_paths[current_state, step]\n
Run Code Online (Sandbox Code Playgroud)\n\n

一旦你有了你的路径,你就可以计算你的轨迹:

\n\n
sigma = 0.1\ntrace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)\n
Run Code Online (Sandbox Code Playgroud)\n\n

比较时间:

\n\n

纯Pythonfor循环

\n\n
%%timeit\ntrace_list = []\ncurrent_state = np.random.choice(state_idx)\nfor _ in range(n_frames):\n    trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))\n    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])\n
Run Code Online (Sandbox Code Playgroud)\n\n

结果:

\n\n
30.1 ms \xc2\xb1 436 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n\n

矢量化查找

\n\n
%%timeit\ncurrent_state = np.random.choice(state_idx)\npath_trace = [None]*n_frames\npossible_paths = np.vstack(\n    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)\n    for curr_state in range(n_states)\n)\nfor step in range(n_frames):\n    path_trace[step] = possible_paths[current_state, step]\n    current_state = possible_paths[current_state, step]\ntrace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)\n
Run Code Online (Sandbox Code Playgroud)\n\n

结果:

\n\n
641 \xc2\xb5s \xc2\xb1 6.03 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n\n

加速大约 50 倍。

\n

  • 这个方法有效。对于每个状态“j”和每个“时间”“k”,“possible_paths[j, k]”保存随机生成的下一个状态。该值是使用“prob_nor”中的适当行预先计算的。它预先计算的内容超出了生成路径严格所需的内容,但它使用 numpy 的矢量化代码来完成,因此它比原始代码的重复调用要快得多。在评论中,布伦拉给出了问题参数的预期范围,这应该是足够的信息来决定这个答案是否是一个可行的解决方案。 (3认同)