如何对与熊猫数据框中的类别相同列中的行求和 - python

Question

如何对与熊猫数据框中的类别相同列中的行求和 - python

我一直在格式化日志文件，最后我得到了以下数据框示例，其中我要添加的类别和数字位于同一列中：

df = pd.DataFrame(dict(a=['Cat. A',1,1,3,'Cat. A',2,2,'Cat. B',3,5,2,6,'Cat. B',1,'Cat. C',4]))
>>> a
0   Cat. A
1   1
2   1
3   3
4   Cat. A
5   2
6   2
7   Cat. B
8   3
9   5
10  2
11  6
12  Cat. B
13  1
14  Cat. C
15  4

Run Code Online (Sandbox Code Playgroud)

如果我将每个类别下的所有数字相加，我想获得：

Cat. A= 1+1+3+2+2 = 9
Cat. B= 3+5+2+6+1 = 17
Cat. C= 4

Run Code Online (Sandbox Code Playgroud)

我知道如何以经典方式浏览所有文件，但我想知道如何以最 Pythonic 的方式进行，考虑到每个类别的行数是可变的，并且类别出现在每个数据框中的次数也可能不同。

Answer 1

mcc*_*dar 6

这也是另一种方式

df = pd.DataFrame(dict(a=['Cat. A',1,1,3,'Cat. A',2,2,'Cat. B',3,5,2,6,'Cat. B',1,'Cat. C',4]))

def coerce(x):
    try:
        int(x)
        return np.nan
    except:
        return x

def safesum(x):
    return x[x!=x.iloc[0]].astype(int).sum()


df['b'] = df['a'].apply(coerce).ffill()
df.groupby('b').agg(safesum)

Run Code Online (Sandbox Code Playgroud)

生产

         a
b         
Cat. A   9
Cat. B  17
Cat. C   4

Run Code Online (Sandbox Code Playgroud)

Answer 2

Ch3*_*teR 3

我们可以使用 usepd.to_numeric将非数字字段标记为nanusing Series.mask，Series.notna然后用于组。然后使用GroupBy.sum

a = pd.to_numeric(df['a'], errors='coerce')
g = df['a'].mask(a.notna()).ffill()
a.groupby(g).sum()

Cat. A     9.0
Cat. B    17.0
Cat. C     4.0
Name: a, dtype: float64

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，11 月前
查看次数：	344 次
最近记录：	4 年，11 月前