Python pandas:获取数据框中值的位置

Question

Python pandas:获取数据框中值的位置

假设我有以下数据帧:

Run Code Online (Sandbox Code Playgroud)

有没有办法可以获得特定值存在的索引/列值？例如,类似于以下内容:

values = df.search(1)

Run Code Online (Sandbox Code Playgroud)

会的values = [(1, 'a'), (2, 'b'), (3, 'b')].

Answer 1

Ale*_*lex 11

df[df == 1].stack().index.tolist()

Run Code Online (Sandbox Code Playgroud)

产量

[(1, 'a'), (2, 'b'), (3, 'b')]

Run Code Online (Sandbox Code Playgroud)

`df.stack()` 创建一个多重索引。因此，您只有双索引的行，而不是行和列。因此，当您在索引上调用“tolist()”方法时，您将获得 2 元组。 (3认同)
非常简单而优雅的答案。非常感谢。 (2认同)

Answer 2

ely*_*ely 5

如果您不介意使用 NumPy 数组，其中第一列表示索引位置，第二列表示列名的索引，因为它驻留在中df.columns，那么它非常短：

In [11]: np.argwhere(df)
Out[11]: 
array([[1, 0],
       [2, 1],
       [3, 1]])

Run Code Online (Sandbox Code Playgroud)

如果要将其格式化为具有实际列名的元组列表，您可以进一步执行以下操作：

In [12]: [(x, df.columns[y]) for x,y in np.argwhere(df)]
Out[12]: [(1, 'a'), (2, 'b'), (3, 'b')]

Run Code Online (Sandbox Code Playgroud)

您可以对内部的逻辑表达式使用相同的方法，np.argwhere例如，假设您有一些随机数据的 DataFrame：

In [13]: dfrm
Out[13]: 
          A         B         C
0  0.382531  0.287066  0.345749
1  0.725201  0.450656  0.336720
2  0.146883  0.266518  0.011339
3  0.111154  0.190367  0.275750
4  0.757144  0.283361  0.736129
5  0.039405  0.643290  0.383777
6  0.632230  0.434664  0.094089
7  0.658512  0.368150  0.433340
8  0.062180  0.523572  0.505400
9  0.287539  0.899436  0.194938

[10 rows x 3 columns]

Run Code Online (Sandbox Code Playgroud)

然后你可以这样做，例如：

In [14]: [(x, dfrm.columns[y]) for x,y in np.argwhere(dfrm > 0.8)]
Out[14]: [(9, 'B')]

Run Code Online (Sandbox Code Playgroud)

作为一个搜索函数，它可以这样定义：

def search(df, df_condition):
    return [(x, df.columns[y]) for x,y in np.argwhere(df_condition(df))]

Run Code Online (Sandbox Code Playgroud)

例如：

In [17]: search(dfrm, lambda x: x > 0.8)
Out[17]: [(9, 'B')]

In [18]: search(df, lambda x: x == 1)
Out[18]: [(1, 'a'), (2, 'b'), (3, 'b')]

Run Code Online (Sandbox Code Playgroud)

@hlin117 请注意，在我的示例中，NumPy 函数直接对 Pandas DataFrame 对象进行操作。假装只使用 Pandas 或 NumPy 是一种谬论。如果您使用的是 Pandas，那么您也会使用 NumPy，而现在反过来也几乎总是正确的。事实上，对于 DataFrame `df`，`df.values` 返回一个 `numpy.ndarray` —— 突出显示 NumPy 是 Pandas 的依赖项。我敦促您不要从“pandas 与 numpy”的角度来考虑它，因为使用 NumPy 函数通常比使用 Pandas 函数要好。 (3认同)

归档时间：	10 年，7 月前
查看次数：	5692 次
最近记录：	10 年，7 月前