如何获取pandas中满足某些条件的列索引？

Question

如何获取pandas中满足某些条件的列索引？

我有以下内容：

x = pd.DataFrame({'a':[1,5,5], 'b':[7,0,7]})

Run Code Online (Sandbox Code Playgroud)

对于每一行，我想获取满足其值大于某个值（假设大于 4）条件的第一列的索引。

在本例中，答案为 1（对应第一行中值 7 的索引）和 0（对应第二行中值 5 的索引）和 1（对应于值 5 的索引）第三行5）。这意味着答案是[1,0,0]。

我用 apply 方法尝试过：

def get_values_from_row(row, th=0.9):
    """Get a list of column names that meet some condition that their values are larger than a threshold.

Args:
    row(pd.DataFrame): a row.
    th(float): the threshold.

Returns:
    string. contains the columns that it's value met the condition.
"""
return row[row > th].index.tolist()[0]

Run Code Online (Sandbox Code Playgroud)

它可以工作，但是我有一个很大的数据集，而且速度很慢。有什么更好的选择。

Answer 1

jez*_*ael 5

我认为你first_valid_index需要get_loc：

print (x[x > 4])
     a    b
0  NaN  7.0
1  5.0  NaN
2  7.0  5.0

print (x[x > 4].apply(lambda x: x.index.get_loc(x.first_valid_index()), axis=1))
0    1
1    0
2    0
dtype: int64

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	5067 次
最近记录：	9 年，8 月前