获取pandas中其列中具有相同值的行

ken*_*ait 19 python dataframe pandas

在pandas中,给定一个DataFrame D:

+-----+--------+--------+--------+   
|     |    1   |    2   |    3   |
+-----+--------+--------+--------+
|  0  | apple  | banana | banana |
|  1  | orange | orange | orange |
|  2  | banana | apple  | orange |
|  3  | NaN    | NaN    | NaN    |
|  4  | apple  | apple  | apple  |
+-----+--------+--------+--------+
Run Code Online (Sandbox Code Playgroud)

当有三列或更多列时,如何返回其所有列中具有相同内容的行,以便它返回:

+-----+--------+--------+--------+   
|     |    1   |    2   |    3   |
+-----+--------+--------+--------+
|  1  | orange | orange | orange |
|  4  | apple  | apple  | apple  |
+-----+--------+--------+--------+
Run Code Online (Sandbox Code Playgroud)

请注意,当所有值都是NaN时,它会跳过行.

如果这只是两列,我通常会这样做,D[D[1]==D[2]]但我不知道如何为超过2列的DataFrames推广这一点.

DSM*_*DSM 13

我的条目:

>>> df
        0       1       2
0   apple  banana  banana
1  orange  orange  orange
2  banana   apple  orange
3     NaN     NaN     NaN
4   apple   apple   apple

[5 rows x 3 columns]
>>> df[df.apply(pd.Series.nunique, axis=1) == 1]
        0       1       2
1  orange  orange  orange
4   apple   apple   apple

[2 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)

这是有效的,因为调用pd.Series.nunique行给出:

>>> df.apply(pd.Series.nunique, axis=1)
0    2
1    1
2    3
3    0
4    1
dtype: int64
Run Code Online (Sandbox Code Playgroud)

注意:但是,这会保留看起来像[nan, nan, apple]或的行[nan, apple, apple].通常我想要,但这可能是你的用例的错误答案.


low*_*ech 12

类似于Andy Hayden的回答,检查min是否等于max(然后行元素都是重复的):

df[df.apply(lambda x: min(x) == max(x), 1)]
Run Code Online (Sandbox Code Playgroud)


And*_*den 6

我会检查每一行是否等于它的第一个元素:

In [11]: df.eq(df[1], axis='index')  # Note: funky broadcasting with df == df[1]
Out[11]: 
      1      2      3
0  True  False  False
1  True   True   True
2  True  False  False
3  True   True   True
4  True   True   True

[5 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)

如果行中的所有内容都为True,则行中的所有元素都相同:

In [12]: df.eq(df[1], axis='index').all(1)
Out[12]: 
0    False
1     True
2    False
3     True
4     True
dtype: bool
Run Code Online (Sandbox Code Playgroud)

仅限于行和可选的dropna:

In [13]: df[df.eq(df[1], axis='index').all(1)]
Out[13]: 
        1       2       3
1  orange  orange  orange
3     NaN     NaN     NaN
4   apple   apple   apple

[3 rows x 3 columns]

In [14]: df[df.eq(df[1], axis='index').all(1)].dropna()
Out[14]: 
        1       2       3
1  orange  orange  orange
4   apple   apple   apple

[2 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)