Roh*_*a K 17 python dataframe python-3.x pandas
我有一个像这样的数据框:
Sequence Duration1 Value1 Duration2 Value2 Duration3 Value3
1001 145 10 125 53 458 33
1002 475 20 175 54 652 45
1003 685 57 687 87 254 88
1004 125 54 175 96 786 96
1005 475 21 467 32 526 32
1006 325 68 301 54 529 41
1007 125 97 325 85 872 78
1008 129 15 429 41 981 82
1009 547 47 577 52 543 83
1010 666 65 722 63 257 87
Run Code Online (Sandbox Code Playgroud)
我想在(Duration1,Duration2,Duration3)中找到Duration的最大值,然后返回相应的Value&Sequence。
我想要的输出:
Sequence,Duration3,Value3
1008, 981, 82
Run Code Online (Sandbox Code Playgroud)
Val*_*_Bo 15
请尝试以下基于Numpy的相当简短的代码:
vv = df.iloc[:, 1::2].values
iRow, iCol = np.unravel_index(vv.argmax(), vv.shape)
iCol = iCol * 2 + 1
result = df.iloc[iRow, [0, iCol, iCol + 1]]
Run Code Online (Sandbox Code Playgroud)
结果是一个Series:
Sequence 1008
Duration3 981
Value3 82
Name: 7, dtype: int64
Run Code Online (Sandbox Code Playgroud)
如果要“修复”它(第一个索引值,然后是实际值),则可以执行以下操作:
pd.DataFrame([result.values], columns=result.index)
Run Code Online (Sandbox Code Playgroud)
numpy
魔法:df
# find the max value in the Duration columns
max_value = max(df.filter(like='Dur', axis=1).max().tolist())
# get a Boolean match of the dataframe for max_value
df_max = df[df == mv]
# get the row index
max_index = df_max.dropna(how='all').index[0]
# get the column name
max_col = df_max.dropna(axis=1, how='all').columns[0]
# get column index
max_col_index = df.columns.get_loc(max_col)
# final
df.iloc[max_index, [0, max_col_index, max_col_index + 1]]
Run Code Online (Sandbox Code Playgroud)
Sequence 1008
Duration3 981
Value3 82
Name: 7, dtype: int64
Run Code Online (Sandbox Code Playgroud)
max_value = max(df.filter(like='Dur', axis=1).max().tolist())
, 返回Duration
列中的最大值max_col_name = df.filter(like='Dur', axis=1).max().idxmax()
,返回出现最大值的列名test = ['Duration5', 'Duration2', 'Duration3']
print(max(test))
>>> 'Duration5'
Run Code Online (Sandbox Code Playgroud)
idmax
# column name with max duration value
max_col_name = df.filter(like='Dur', axis=1).max().idxmax()
# index of max_col_name
max_col_idx =df.columns.get_loc(max_col_name)
# row index of max value in max_col_name
max_row_idx = df[max_col_name].idxmax()
# output with .loc
df.iloc[max_row_idx, [0, max_col_idx, max_col_idx + 1 ]]
Run Code Online (Sandbox Code Playgroud)
Sequence 1008
Duration3 981
Value3 82
Name: 7, dtype: int64
Run Code Online (Sandbox Code Playgroud)
pandas.DataFrame.max
pandas.DataFrame.filter
pandas.DataFrame.idxmax
pandas.Index.get_loc
pandas.DataFrame.iloc
有了宽数据,使用进行重塑会更容易wide_to_long
。这将创建2列['Duration', 'Value']
,并且MultiIndex会告诉我们它是哪个数字。不依赖任何特定的列顺序。
import pandas as pd
df = pd.wide_to_long(df, i='Sequence', j='num', stubnames=['Duration', 'Value'])
df.loc[[df.Duration.idxmax()]]
Duration Value
Sequence num
1008 3 981 82
Run Code Online (Sandbox Code Playgroud)