Bre*_*lla 5 python numpy vectorization
下面的代码模拟了一个系统,它可以随时采样3种不同的状态,这些状态之间的恒定转移概率由矩阵给出prob_nor.因此,每个点都trace取决于之前的状态.
n_states, n_frames = 3, 1000
state_val = np.linspace(0, 1, n_states)
prob = np.random.randint(1, 10, size=(n_states,)*2)
prob[np.diag_indices(n_states)] += 50
prob_nor = prob/prob.sum(1)[:,None] # transition probability matrix,
# row sum normalized to 1.0
state_idx = range(n_states) # states is a list of integers 0, 1, 2...
current_state = np.random.choice(state_idx)
trace = []
sigma = 0.1
for _ in range(n_frames):
trace.append(np.random.normal(loc=state_val[current_state], scale=sigma))
current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])
Run Code Online (Sandbox Code Playgroud)
上面代码中的循环使得它运行得非常慢,特别是当我必须建模数百万个数据点时.有没有办法矢量化/加速它?
尽快卸载概率计算:
\n\npossible_paths = np.vstack(\n np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)\n for curr_state in range(n_states)\n)\nRun Code Online (Sandbox Code Playgroud)\n\n然后你可以简单地进行查找以遵循你的路径:
\n\npath_trace = [None]*n_frames\nfor step in range(n_frames):\n path_trace[step] = possible_paths[current_state, step]\n current_state = possible_paths[current_state, step]\nRun Code Online (Sandbox Code Playgroud)\n\n一旦你有了你的路径,你就可以计算你的轨迹:
\n\nsigma = 0.1\ntrace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)\nRun Code Online (Sandbox Code Playgroud)\n\n比较时间:
\n\n纯Pythonfor循环
%%timeit\ntrace_list = []\ncurrent_state = np.random.choice(state_idx)\nfor _ in range(n_frames):\n trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))\n current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])\nRun Code Online (Sandbox Code Playgroud)\n\n结果:
\n\n30.1 ms \xc2\xb1 436 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\nRun Code Online (Sandbox Code Playgroud)\n\n矢量化查找:
\n\n%%timeit\ncurrent_state = np.random.choice(state_idx)\npath_trace = [None]*n_frames\npossible_paths = np.vstack(\n np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)\n for curr_state in range(n_states)\n)\nfor step in range(n_frames):\n path_trace[step] = possible_paths[current_state, step]\n current_state = possible_paths[current_state, step]\ntrace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)\nRun Code Online (Sandbox Code Playgroud)\n\n结果:
\n\n641 \xc2\xb5s \xc2\xb1 6.03 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n\n加速大约 50 倍。
\n