Python/Pandas Binning Data Timedelta

cmf*_*f05 6 python datetime timedelta binning pandas

我有一个包含两列的DataFrame

    userID     duration
0   DSm7ysk    03:08:49
1   no51CdJ    00:35:50
2   ...
Run Code Online (Sandbox Code Playgroud)

'duration'具有timedelta类型.我试过用

bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes = 
        5),dt.timedelta(minutes = 10),dt.timedelta(minutes = 
        20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]

labels = ['0-5min','5-10min','10-20min','20-30min','30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)
Run Code Online (Sandbox Code Playgroud)

但是,分箱数据不使用指定的分箱,而是在帧中的每个持续时间内创建.

将timedelta对象分成不规则区间的最简单方法是什么?或者我只是错过了一些明显的东西?

god*_*ryd 4

它适用于我的 pandas 0.23.4

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
    'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})

bins = [
    pd.Timedelta(minutes = 0),
    pd.Timedelta(minutes = 5),
    pd.Timedelta(minutes = 10),
    pd.Timedelta(minutes = 20),
    pd.Timedelta(minutes = 30),
    pd.Timedelta(hours = 4)
]

labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)
Run Code Online (Sandbox Code Playgroud)

结果:

结果