计算熊猫不均匀的垃圾箱

pit*_*las 1 python pandas

pd.DataFrame({'email':["a@gmail.com", "b@gmail.com", "c@gmail.com", "d@gmail.com", "e@gmail.com",],
                  'one':[88, 99, 11, 44, 33],
                  'two': [80, 80, 85, 80, 70],
                   'three': [50, 60, 70, 80, 20]})
Run Code Online (Sandbox Code Playgroud)

给定这个DataFrame,我想为每列计算一个,两个和三个,在一定范围内有多少个值.

范围例如是:0-70,71-80,81-90,91-100

结果将是:

out = pd.DataFrame({'colname': ["one", "two", "three"],
                   'b0to70': [3, 1, 4],
                   'b71to80': [0, 3, 1],
                   'b81to90': [1, 1, 0],
                   'b91to100': [1, 0, 0]})
Run Code Online (Sandbox Code Playgroud)

什么是一个很好的惯用方法呢?

Rob*_*bie 7

这样做:

out = pd.DataFrame()
for name in ['one','two','three']:
    out[name] = pd.cut(df[name], bins=[0,70,80,90,100]).value_counts()
out.sort_index(inplace=True)
Run Code Online (Sandbox Code Playgroud)

返回:

           one  two  three
(0, 70]      3    1      4
(70, 80]     0    3      1
(80, 90]     1    1      0
(90, 100]    1    0      0
Run Code Online (Sandbox Code Playgroud)

  • 而不是外部for循环使用内置应用即``df.set_index('email').apply(pd.cut,bins = [0,70,80,90,100]).apply(pd.value_counts)` (2认同)