通过pandas删除两列具有相同值的行

Jac*_*ack 8 pandas

输入:

    S   T   W      U
0   A   A   1   Undirected
1   A   B   0   Undirected
2   A   C   1   Undirected
3   B   A   0   Undirected
4   B   B   1   Undirected
5   B   C   1   Undirected
6   C   A   1   Undirected
7   C   B   1   Undirected
8   C   C   1   Undirected
Run Code Online (Sandbox Code Playgroud)

输出:

    S   T   W      U
1   A   B   0   Undirected
2   A   C   1   Undirected
3   B   A   0   Undirected
5   B   C   1   Undirected
6   C   A   1   Undirected
7   C   B   1   Undirected
Run Code Online (Sandbox Code Playgroud)

对于列S和T,行(0,4,8)具有相同的值.我想放弃这些行.

试:

我用过df.drop_duplicates(['S','T']但失败了,我怎么能得到结果.

jez*_*ael 16

你需要boolean indexing:

print (df['S'] != df['T'])
0    False
1     True
2     True
3     True
4    False
5     True
6     True
7     True
8    False
dtype: bool

df = df[df['S'] != df['T']]
print (df)
   S  T  W           U
1  A  B  0  Undirected
2  A  C  1  Undirected
3  B  A  0  Undirected
5  B  C  1  Undirected
6  C  A  1  Undirected
7  C  B  1  Undirected
Run Code Online (Sandbox Code Playgroud)

或者query:

df = df.query("S != T")
print (df)
   S  T  W           U
1  A  B  0  Undirected
2  A  C  1  Undirected
3  B  A  0  Undirected
5  B  C  1  Undirected
6  C  A  1  Undirected
7  C  B  1  Undirected
Run Code Online (Sandbox Code Playgroud)

  • @JoeRivera - 然后使用 `L = ['S','T'] df = df[df[L].ne(df[L[0]], axis=0).any(axis=1)]` -按列表的第一列比较所有列,并通过 [`DataFrame.any`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame) 测试是否不等于至少一个值。任何.html) (2认同)