在python中查找并替换多个值

bla*_*laz 9 python numpy pandas

我想在新的1D数组/列表中查找和替换多个值.

在列表的示例中

a=[2, 3, 2, 5, 4, 4, 1, 2]
Run Code Online (Sandbox Code Playgroud)

我想替换

val_old=[1, 2, 3, 4, 5] 
Run Code Online (Sandbox Code Playgroud)

val_new=[2, 3, 4, 5, 1]
Run Code Online (Sandbox Code Playgroud)

因此新阵列是:

a_new=[3, 4, 3, 1, 5, 5, 2, 3]
Run Code Online (Sandbox Code Playgroud)

最快的方法是什么(对于非常大的列表,即50000值来查找和替换)?

评论 anwsers

感谢大家的快速回复!我用以下方法检查了建议的解决方案:

N = 10**4
N_val = 0.5*N
a = np.random.randint(0, N_val, size=N)
val_old = np.arange(N_val, dtype=np.int)
val_new = np.arange(N_val, dtype=np.int)
np.random.shuffle(val_new)

a1 = list(a)
val_old1 = list(val_old)
val_new1 = list(val_new)

def Ashwini_Chaudhary(a, val_old, val_new):
    arr = np.empty(a.max()+1, dtype=val_new.dtype)
    arr[val_old] = val_new
    return arr[a]

def EdChum(a, val_old, val_new):
    df = pd.Series(a, dtype=val_new.dtype)
    d = dict(zip(val_old, val_new))
    return df.map(d).values   

def xxyzzy(a, val_old, val_new):
    return [val_new[val_old.index(x)] for x in a]

def Shashank_and_Hackaholic(a, val_old, val_new):
    d = dict(zip(val_old, val_new))
    return [d.get(e, e) for e in a]

def itzmeontv(a, val_old, val_new):
    return [val_new[val_old.index(i)] if i in val_old else i for i in a]

def swenzel(a, val_old, val_new):
    return val_new[np.searchsorted(val_old,a)]

def Divakar(a, val_old, val_new):
    C,R = np.where(a[:,np.newaxis] == val_old[np.newaxis,:])
    a[C] = val_new[R]
    return a
Run Code Online (Sandbox Code Playgroud)

结果:

%timeit -n100 Ashwini_Chaudhary(a, val_old, val_new)
100 loops, best of 3: 77.6 µs per loop

%timeit -n100 swenzel(a, val_old, val_new)
100 loops, best of 3: 703 µs per loop

%timeit -n100 Shashank_and_Hackaholic(a1, val_old1, val_new1)
100 loops, best of 3: 1.7 ms per loop

%timeit -n100 EdChum(a, val_old, val_new)
100 loops, best of 3: 17.6 ms per loop

%timeit -n10 Divakar(a, val_old, val_new)
10 loops, best of 3: 209 ms per loop

%timeit -n10 xxyzzy(a1, val_old1, val_new1)
10 loops, best of 3: 429 ms per loop

%timeit -n10 itzmeontv(a1, val_old1, val_new1)
10 loops, best of 3: 847 ms per loop
Run Code Online (Sandbox Code Playgroud)

性能的相对差异随着biger 而增加N,即如果N=10**7,那么Ashwini_Chaudhary 207 ms的结果和swenzel 的结果6.89 s.

Ash*_*ary 5

>>> arr = np.empty(a.max() + 1, dtype=val_new.dtype)
>>> arr[val_old] = val_new
>>> arr[a]
array([3, 4, 3, 1, 5, 5, 2, 3])
Run Code Online (Sandbox Code Playgroud)