给定一个日期范围,我们如何将其分解为N个连续的子区间?

Sco*_*ott 12 python date python-2.7 python-datetime

我通过API访问一些数据,我需要为我的请求提供日期范围,例如.start ='20100101',end ='20150415'.我想我会通过将日期范围分解为非重叠间隔并在每个间隔上使用多处理来加快速度.

我的问题是我如何分解日期范围并不能始终如一地给我预期的结果.这是我做的:

from datetime import date

begin = '20100101'
end = '20101231'
Run Code Online (Sandbox Code Playgroud)

假设我们想把它分解成几个季度.首先,我将字符串更改为日期:

def get_yyyy_mm_dd(yyyymmdd):
    # given string 'yyyymmdd' return (yyyy, mm, dd)
    year = yyyymmdd[0:4]
    month = yyyymmdd[4:6]
    day = yyyymmdd[6:]
    return int(year), int(month), int(day)

y1, m1, d1 = get_yyyy_mm_dd(begin)
d1 = date(y1, m1, d1)
y2, m2, d2 = get_yyyy_mm_dd(end)
d2 = date(y2, m2, d2)
Run Code Online (Sandbox Code Playgroud)

然后将此范围划分为子间隔:

def remove_tack(dates_list):
    # given a list of dates in form YYYY-MM-DD return a list of strings in form 'YYYYMMDD'
    tackless = []
    for d in dates_list:
        s = str(d)
        tackless.append(s[0:4]+s[5:7]+s[8:])
    return tackless

def divide_date(date1, date2, intervals):
    dates = [date1]
    for i in range(0, intervals):
        dates.append(dates[i] + (date2 - date1)/intervals)
    return remove_tack(dates)
Run Code Online (Sandbox Code Playgroud)

使用上面的开头和结尾我们得到:

listdates = divide_date(d1, d2, 4)
print listdates # ['20100101', '20100402', '20100702', '20101001', '20101231'] looks correct
Run Code Online (Sandbox Code Playgroud)

但如果相反我使用日期:

begin = '20150101'
end = '20150228'
Run Code Online (Sandbox Code Playgroud)

...

listdates = divide_date(d1, d2, 4)
print listdates # ['20150101', '20150115', '20150129', '20150212', '20150226']
Run Code Online (Sandbox Code Playgroud)

我在2月底错过了两天.我的应用程序不需要时间或时区,我不介意安装另一个库.

Abh*_*jit 17

我实际上会遵循不同的方法,并依赖timedelta和date添加来确定非重叠范围

履行

def date_range(start, end, intv):
    from datetime import datetime
    start = datetime.strptime(start,"%Y%m%d")
    end = datetime.strptime(end,"%Y%m%d")
    diff = (end  - start ) / intv
    for i in range(intv):
        yield (start + diff * i).strftime("%Y%m%d")
    yield end.strftime("%Y%m%d")
Run Code Online (Sandbox Code Playgroud)

执行

>>> begin = '20150101'
>>> end = '20150228'
>>> list(date_range(begin, end, 4))
['20150101', '20150115', '20150130', '20150213', '20150228']
Run Code Online (Sandbox Code Playgroud)


小智 7

# create bins
bins = pd.date_range(start='2020-12-27', end='2022-11-27', periods=3)

bins
# DatetimeIndex(['2020-12-27', '2021-12-12', '2022-11-27'], dtype='datetime64[ns]', freq=None)

# cut into intervals
pd.cut(df['datetime_col'], bins=bins)
Run Code Online (Sandbox Code Playgroud)