如何在pandas数据框中互换0和1值?

jha*_*ins 5 python numpy dataframe pandas

我正在使用一个pandas数据帧,其中包含一个全0和1的列,我试图切换每个值(即所有的0变为1,所有的1变为0).是否有捷径可寻?

jez*_*ael 7

用途replace:

df = df.replace({0:1, 1:0})
Run Code Online (Sandbox Code Playgroud)

或更快numpy.logical_xor:

df = np.logical_xor(df,1).astype(int)
Run Code Online (Sandbox Code Playgroud)

或者更快:

df = pd.DataFrame(np.logical_xor(df.values,1).astype(int),columns=df.columns, index=df.index)
Run Code Online (Sandbox Code Playgroud)

样品:

np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10,3]))
print (df)
   0  1  2
0  1  1  0
1  1  1  0
2  1  1  0
3  0  0  1
4  0  1  1
5  1  0  1
6  0  0  0
7  1  0  0
8  1  0  1
9  1  0  0

df = df.replace({0:1, 1:0})
print (df)
   0  1  2
0  0  0  1
1  0  0  1
2  0  0  1
3  1  1  0
4  1  0  0
5  0  1  0
6  1  1  1
7  0  1  1
8  0  1  0
9  0  1  1
Run Code Online (Sandbox Code Playgroud)

另一种方案:

df = (~df.astype(bool)).astype(int)
print (df)
   0  1  2
0  0  0  1
1  0  0  1
2  0  0  1
3  1  1  0
4  1  0  0
5  0  1  0
6  1  1  1
7  0  1  1
8  0  1  0
9  0  1  1
Run Code Online (Sandbox Code Playgroud)

时间:

np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
print (df)

In [69]: %timeit (np.logical_xor(df,1).astype(int))
1 loop, best of 3: 1.42 s per loop

In [70]: %timeit (df ^ 1)
1 loop, best of 3: 2.53 s per loop

In [71]: %timeit ((~df.astype(bool)).astype(int))
1 loop, best of 3: 1.81 s per loop

In [72]: %timeit (df.replace({0:1, 1:0}))
1 loop, best of 3: 5.08 s per loop

In [73]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
1 loop, best of 3: 350 ms per loop
Run Code Online (Sandbox Code Playgroud)

编辑:这应该更快:

import numexpr as ne
arr = df.values
df = pd.DataFrame(ne.evaluate('1 - arr'),columns=df.columns, index=df.index)
Run Code Online (Sandbox Code Playgroud)


Div*_*kar 7

一种简单的方法是 -

df[:] = 1-df.values
Run Code Online (Sandbox Code Playgroud)

为了性能,我们可能希望使用底层数组数据,对于像这样的修改版本 -

a = df.values
a[:] = 1-a
Run Code Online (Sandbox Code Playgroud)

样本运行 -

In [43]: df
Out[43]: 
   0  1  2
0  0  0  1
1  0  0  1
2  0  0  1
3  1  1  0
4  1  0  0

In [44]: df[:] = 1-df.values

In [45]: df
Out[45]: 
   0  1  2
0  1  1  0
1  1  1  0
2  1  1  0
3  0  0  1
4  0  1  1
Run Code Online (Sandbox Code Playgroud)

使用@jezrael's timings setup该设置中的最佳解决方案与本文中提出的解决方案进行比较 -

In [46]: np.random.seed(12)
    ...: df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
    ...: 

# Proposed in this post
In [47]: def swap_0_1(df):
    ...:     a = df.values
    ...:     a[:] = 1-a
    ...:     

In [48]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
10 loops, best of 3: 218 ms per loop

In [49]: %timeit swap_0_1(df)
10 loops, best of 3: 198 ms per loop
Run Code Online (Sandbox Code Playgroud)

或者甚至更好地使用输入数组数据的布尔版本的否定 -

In [60]: def swap_0_1_bool(df):
    ...:     a = df.values
    ...:     a[:] = ~a.astype(bool)
    ...:     

In [63]: %timeit swap_0_1_bool(df)
10 loops, best of 3: 179 ms per loop
Run Code Online (Sandbox Code Playgroud)