如何替换numpy数组中的值列表?

itz*_*bat 6 python arrays performance numpy

我有一个未排序的数字.

我需要用特定的替代品替换某些数字(在列表中给出)(也在相应的列表中给出)

我写了下面的代码(似乎有效):

import numpy as np

numbers = np.arange(0,40)
np.random.shuffle(numbers)
problem_numbers = [33, 23, 15]  # table, night_stand, plant
alternative_numbers = [12, 14, 26]  # desk, dresser, flower_pot

for i in range(len(problem_numbers)):
    idx = numbers == problem_numbers[i]
    numbers[idx] = alternative_numbers[i]
Run Code Online (Sandbox Code Playgroud)

然而,这似乎非常低效(对于更大的阵列,这需要进行数百万次).

我发现这个问题回答了类似的问题,但在我的情况下,数字没有排序,他们需要保持原来的位置.

注意:numbers可能包含多个或不包含的元素problem_numbers

jde*_*esa 4

编辑:我在这个答案中实现了一个 TensorFlow 版本(几乎完全相同,除了替换是一个字典)。

\n\n
\n\n

这是一个简单的方法:

\n\n
import numpy as np\n\nnumbers = np.arange(0,40)\nnp.random.shuffle(numbers)\nproblem_numbers = [33, 23, 15]  # table, night_stand, plant\nalternative_numbers = [12, 14, 26]  # desk, dresser, flower_pot\n\n# Replace values\nproblem_numbers = np.asarray(problem_numbers)\nalternative_numbers = np.asarray(alternative_numbers)\nn_min, n_max = numbers.min(), numbers.max()\nreplacer = np.arange(n_min, n_max + 1)\n# Mask replacements out of range\nmask = (problem_numbers >= n_min) & (problem_numbers <= n_max)\nreplacer[problem_numbers[mask] - n_min] = alternative_numbers[mask]\nnumbers = replacer[numbers - n_min]\n
Run Code Online (Sandbox Code Playgroud)\n\n

只要值的范围numbers(最小和最大之间的差异)不是很大(例如,没有像1,7和 之类的东西10000000000),这种方法就可以很好地工作并且应该是有效的。

\n\n

标杆管理

\n\n

我已将 OP 中的代码与使用此代码提出的三个(截至目前)解决方案进行了比较:

\n\n
import numpy as np\n\ndef method_itzik(numbers, problem_numbers, alternative_numbers):\n    numbers = np.asarray(numbers)\n    for i in range(len(problem_numbers)):\n        idx = numbers == problem_numbers[i]\n        numbers[idx] = alternative_numbers[i]\n    return numbers\n\ndef method_mseifert(numbers, problem_numbers, alternative_numbers):\n    numbers = np.asarray(numbers)\n    replacer = dict(zip(problem_numbers, alternative_numbers))\n    numbers_list = numbers.tolist()\n    numbers = np.array(list(map(replacer.get, numbers_list, numbers_list)))\n    return numbers\n\ndef method_divakar(numbers, problem_numbers, alternative_numbers):\n    numbers = np.asarray(numbers)\n    problem_numbers = np.asarray(problem_numbers)\n    problem_numbers = np.asarray(alternative_numbers)\n    # Pre-process problem_numbers and correspondingly alternative_numbers\n    # such that repeats and no matches are taken care of\n    sidx_pn = problem_numbers.argsort()\n    pn = problem_numbers[sidx_pn]\n    mask = np.concatenate(([True],pn[1:] != pn[:-1]))\n    an = alternative_numbers[sidx_pn]\n\n    minN, maxN = numbers.min(), numbers.max()\n    mask &= (pn >= minN) & (pn <= maxN)\n\n    pn = pn[mask]\n    an = an[mask]\n\n    # Pre-pocessing done. Now, we need to use pn and an in place of\n    # problem_numbers and alternative_numbers repectively. Map, index and assign.\n    sidx = numbers.argsort()\n    idx = sidx[np.searchsorted(numbers, pn, sorter=sidx)]\n    valid_mask = numbers[idx] == pn\n    numbers[idx[valid_mask]] = an[valid_mask]\n\ndef method_jdehesa(numbers, problem_numbers, alternative_numbers):\n    numbers = np.asarray(numbers)\n    problem_numbers = np.asarray(problem_numbers)\n    alternative_numbers = np.asarray(alternative_numbers)\n    n_min, n_max = numbers.min(), numbers.max()\n    replacer = np.arange(n_min, n_max + 1)\n    # Mask replacements out of range\n    mask = (problem_numbers >= n_min) & (problem_numbers <= n_max)\n    replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask]\n    numbers = replacer[numbers - n_min]\n    return numbers\n
Run Code Online (Sandbox Code Playgroud)\n\n

结果:

\n\n
import numpy as np\n\nnp.random.seed(100)\n\nMAX_NUM = 100000\nnumbers = np.random.randint(0, MAX_NUM, size=100000)\nproblem_numbers = np.unique(np.random.randint(0, MAX_NUM, size=500))\nalternative_numbers = np.random.randint(0, MAX_NUM, size=len(problem_numbers))\n\n%timeit method_itzik(numbers, problem_numbers, alternative_numbers)\n10 loops, best of 3: 63.3 ms per loop\n\n# This method expects lists\nproblem_numbers_l = list(problem_numbers)\nalternative_numbers_l = list(alternative_numbers)\n%timeit method_mseifert(numbers, problem_numbers_l, alternative_numbers_l)\n10 loops, best of 3: 20.5 ms per loop\n\n%timeit method_divakar(numbers, problem_numbers, alternative_numbers)\n100 loops, best of 3: 9.45 ms per loop\n\n%timeit method_jdehesa(numbers, problem_numbers, alternative_numbers)\n1000 loops, best of 3: 822 \xc2\xb5s per loop\n
Run Code Online (Sandbox Code Playgroud)\n