pandas dataframe: how to aggregate a subset of rows based on value of a column

Val*_*ino 6 python dataframe pandas

I have a pandas dataframe structured like this:

      value
lab        
A        50
B        35
C         8
D         5
E         1
F         1
Run Code Online (Sandbox Code Playgroud)

This is just an example, the actual dataframe is bigger, but follows the same structure.
The sample dataframe has been created with this two lines:

df = pd.DataFrame({'lab':['A', 'B', 'C', 'D', 'E', 'F'], 'value':[50, 35, 8, 5, 1, 1]})
df = df.set_index('lab')
Run Code Online (Sandbox Code Playgroud)

I would like to aggregate the rows whose value is smaller that a given threshold: all these rows should be substituted by a single row whose value is the sum of the substituted rows.

例如,如果我选择阈值= 6,则预期结果应为:

      value
lab        
A        50
B        35
C         8
X         7 #sum of D, E, F
Run Code Online (Sandbox Code Playgroud)

我怎样才能做到这一点?

我以为可以使用groupby(),但是我所看到的所有示例都涉及到使用单独的列进行分组,因此在这种情况下我不知道如何使用它。通过执行操作,
我可以选择小于阈值的行locdf.loc[df['value'] < threshold]但是我不知道如何只对这些行求和,而其余数据框保持不变。

Chr*_*s A 7

您可以“单线” 使用lambdaDataFrame.append实现此目标:

thresh = 6

(df[lambda x: x['value'] >= thresh]
 .append(df[lambda x: x['value'] < thresh].sum().rename('X')))
Run Code Online (Sandbox Code Playgroud)

或者,如果您愿意

mask = df['value'].ge(thresh)

df[mask].append(df[~mask].sum().rename('X'))
Run Code Online (Sandbox Code Playgroud)

[出]

     value
lab       
A       50
B       35
C        8
X        7
Run Code Online (Sandbox Code Playgroud)


jez*_*ael 5

使用具有放大设置有过滤DataFrame

threshold = 6
m = df['value'] < threshold
df1 = df[~m].copy()
df1.loc['Z'] = df.loc[m, 'value'].sum()

print (df1)
     value
lab       
A       50
B       35
C        8
Z        7
Run Code Online (Sandbox Code Playgroud)

另一个解决方案:

m = df['value'] < threshold
df1 = df[~m].append(df.loc[m, ['value']].sum().rename('Z'))
print (df1)
     value
lab       
A       50
B       35
C        8
Z        7
Run Code Online (Sandbox Code Playgroud)