用途replace:
df = df.replace({0:1, 1:0})
Run Code Online (Sandbox Code Playgroud)
df = np.logical_xor(df,1).astype(int)
Run Code Online (Sandbox Code Playgroud)
或者更快:
df = pd.DataFrame(np.logical_xor(df.values,1).astype(int),columns=df.columns, index=df.index)
Run Code Online (Sandbox Code Playgroud)
样品:
np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10,3]))
print (df)
0 1 2
0 1 1 0
1 1 1 0
2 1 1 0
3 0 0 1
4 0 1 1
5 1 0 1
6 0 0 0
7 1 0 0
8 1 0 1
9 1 0 0
df = df.replace({0:1, 1:0})
print (df)
0 1 2
0 0 0 1
1 0 0 1
2 0 0 1
3 1 1 0
4 1 0 0
5 0 1 0
6 1 1 1
7 0 1 1
8 0 1 0
9 0 1 1
Run Code Online (Sandbox Code Playgroud)
另一种方案:
df = (~df.astype(bool)).astype(int)
print (df)
0 1 2
0 0 0 1
1 0 0 1
2 0 0 1
3 1 1 0
4 1 0 0
5 0 1 0
6 1 1 1
7 0 1 1
8 0 1 0
9 0 1 1
Run Code Online (Sandbox Code Playgroud)
时间:
np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
print (df)
In [69]: %timeit (np.logical_xor(df,1).astype(int))
1 loop, best of 3: 1.42 s per loop
In [70]: %timeit (df ^ 1)
1 loop, best of 3: 2.53 s per loop
In [71]: %timeit ((~df.astype(bool)).astype(int))
1 loop, best of 3: 1.81 s per loop
In [72]: %timeit (df.replace({0:1, 1:0}))
1 loop, best of 3: 5.08 s per loop
In [73]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
1 loop, best of 3: 350 ms per loop
Run Code Online (Sandbox Code Playgroud)
编辑:这应该更快:
import numexpr as ne
arr = df.values
df = pd.DataFrame(ne.evaluate('1 - arr'),columns=df.columns, index=df.index)
Run Code Online (Sandbox Code Playgroud)
一种简单的方法是 -
df[:] = 1-df.values
Run Code Online (Sandbox Code Playgroud)
为了性能,我们可能希望使用底层数组数据,对于像这样的修改版本 -
a = df.values
a[:] = 1-a
Run Code Online (Sandbox Code Playgroud)
样本运行 -
In [43]: df
Out[43]:
0 1 2
0 0 0 1
1 0 0 1
2 0 0 1
3 1 1 0
4 1 0 0
In [44]: df[:] = 1-df.values
In [45]: df
Out[45]:
0 1 2
0 1 1 0
1 1 1 0
2 1 1 0
3 0 0 1
4 0 1 1
Run Code Online (Sandbox Code Playgroud)
使用@jezrael's timings setup该设置中的最佳解决方案与本文中提出的解决方案进行比较 -
In [46]: np.random.seed(12)
...: df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
...:
# Proposed in this post
In [47]: def swap_0_1(df):
...: a = df.values
...: a[:] = 1-a
...:
In [48]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
10 loops, best of 3: 218 ms per loop
In [49]: %timeit swap_0_1(df)
10 loops, best of 3: 198 ms per loop
Run Code Online (Sandbox Code Playgroud)
或者甚至更好地使用输入数组数据的布尔版本的否定 -
In [60]: def swap_0_1_bool(df):
...: a = df.values
...: a[:] = ~a.astype(bool)
...:
In [63]: %timeit swap_0_1_bool(df)
10 loops, best of 3: 179 ms per loop
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1427 次 |
| 最近记录: |