Pandas Dataframe 中两个日期之间的营业时间(包括节假日)

Pet*_*rDS 6 python date pandas

Python 新手用户 - 我正在尝试计算 pandas DataFrame 中两个日期之间的营业时间(给定上午 9 点至下午 5 点、周一至周五工作时间)并排除澳大利亚公共假期。

在过去的几天里,我尝试将很多解决方案组合在一起并将其应用于我的问题,但我遇到了很大的麻烦。

我将发布我当前的迭代,但也会寻求反馈,作为处理整个问题的最佳方式,并了解如何在未来解决这些问题。

我最近的尝试是使用 pandas CDay 然后为澳大利亚日期创建一个自定义假日日历,这一切似乎都有效 - 然后从这一步将其应用到我无法理解的 pandas 日期。我正在使用此https://codereview.stackexchange.com/questions/135142/calculate-working-minutes- Between-two-timestamps/135200#135200 解决方案中的自定义函数来计算日期之间的分钟数,但没有运气。

感谢任何帮助!

import datetime
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar
from pandas.tseries.offsets import CDay

class HolidayCalendar(AbstractHolidayCalendar):
    rules =[Holiday('New Years Day',year=2016,month=1,day=1),
        Holiday('Australia Day',year=2016,month=1,day=26),
        Holiday('Good Friday',year=2016,month=3,day=25),
        Holiday('Easter Monday',year=2016,month=3,day=28),
        Holiday('ANZAC Day',year=2016,month=4,day=25),
        Holiday('Queens Birthday',year=2016,month=6,day=13),
        Holiday('Christmas Day',year=2016,month=12,day=25),
        Holiday('Boxing Day',year=2016,month=12,day=26),           
        Holiday('New Years Day',year=2017,month=1,day=1),
        Holiday('Australia Day',year=2017,month=1,day=26),
        Holiday('Good Friday',year=2017,month=4,day=15),
        Holiday('Easter Monday',year=2017,month=4,day=17),
        Holiday('ANZAC Day',year=2017,month=4,day=25),
        Holiday('Queens Birthday',year=2017,month=6,day=12),
        Holiday('Christmas Day',year=2017,month=12,day=25),
        Holiday('Boxing Day',year=2017,month=12,day=26),
        Holiday('New Years Day',year=2018,month=1,day=1),
        Holiday('Australia Day',year=2018,month=1,day=26),
        Holiday('Good Friday',year=2018,month=3,day=30),
        Holiday('Easter Monday',year=2018,month=4,day=2),
        Holiday('ANZAC Day',year=2018,month=4,day=25),
        Holiday('Queens Birthday',year=2018,month=6,day=11),
        Holiday('Christmas Day',year=2018,month=12,day=25),
        Holiday('Boxing Day',year=2018,month=12,day=26)]

cal = HolidayCalendar()
dayindex = pd.bdate_range(datetime.date(2015,1,1),datetime.date.today(),freq=CDay(calendar=cal))

day_series = dayindex.to_series()

def count_mins(start,end):

starttime = datetime.datetime.fromtimestamp(int(start)/1000)

endtime = datetime.datetime.fromtimestamp(int(end)/1000)

days = day_series[starttime.date():endtime.date()]

daycount = len(days)

if daycount == 0:
    return daycount
else:
    startday = datetime.datetime(days[0].year,
                             days[0].month,
                             days[0].day,
                             hour=9,
                             minute=0)
    endday = datetime.datetime(days[-1].year,
                           days[-1].month,
                           days[-1].day,
                           hour=17,
                           minute=0)
    if daycount == 1:  

        if starttime < startday:
            periodstart = startday
        else:
            periodstart = starttime
        if endtime > endday:
            periodend = endday
        else:
            periodend = endtime

        return (periodend - periodstart).seconds/60

    if daycount == 2:

        if starttime < startday:
            first_day_mins = 480
        else:
            first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
        if endtime > endday:
            second_day_mins = 480
        else:
            second_day_mins = (endtime-endday.replace(hour=9)).seconds/60

        return (first_day_mins + second_day_mins)

    else:

        if starttime < startday:
            first_day_mins = 480
        else:
            first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
        if endtime > endday:
            second_day_mins = 480
        else:
            second_day_mins = (endtime-endday.replace(hour=9)).seconds/60

        return (first_day_mins + second_day_mins + ((daycount-2)*480))


df_updated['Created Date'] = pd.to_datetime(df_updated['Created Date'])
df_updated['Updated Date'] = pd.to_datetime(df_updated['Updated Date'])
df_updated['Created Date'] = df_updated['Created Date'].astype(np.int64) / 
int(1e6)
df_updated['Updated Date'] = df_updated['Updated Date'].astype(np.int64) / 
int(1e6)

count_mins(df_updated['Created Date'], df_updated['Updated Date'])
Run Code Online (Sandbox Code Playgroud)

小智 6

在 PyPi 中尝试这个名为business-duration的包

pip install 业务持续时间

示例代码:

from business_duration import businessDuration
import pandas as pd
from datetime import time,datetime
import holidays as pyholidays

startdate = pd.to_datetime('2017-01-01 00:00:00')
enddate = pd.to_datetime('2017-01-31 23:00:00')

starttime=time(9,0,0)
endtime=time(17,0,0)

holidaylist = pyholidays.Australia()
unit='hour'

#By default weekends are Saturday and Sunday
print(businessDuration(startdate,enddate,starttime,endtime,holidaylist=holidayli
st,unit=unit))

Output: 160.0

holidaylist:
{datetime.date(2017, 1, 1): "New Year's Day",
 datetime.date(2017, 1, 2): "New Year's Day (Observed)",
 datetime.date(2017, 1, 26): 'Australia Day',
 datetime.date(2017, 3, 6): 'Canberra Day',
 datetime.date(2017, 4, 14): 'Good Friday',
 datetime.date(2017, 4, 15): 'Easter Saturday',
 datetime.date(2017, 4, 17): 'Easter Monday',
 datetime.date(2017, 4, 25): 'Anzac Day',
 datetime.date(2017, 6, 12): "Queen's Birthday",
 datetime.date(2017, 9, 26): 'Family & Community Day',
 datetime.date(2017, 10, 2): 'Labour Day',
 datetime.date(2017, 12, 25): 'Christmas Day',
 datetime.date(2017, 12, 26): 'Boxing Day'}
Run Code Online (Sandbox Code Playgroud)


And*_*den 0

您可以使用以下长度bdate_range

In [11]: pd.bdate_range('2017-01-01', '2017-10-23')
Out[11]:
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
               '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
               '2017-01-12', '2017-01-13',
               ...
               '2017-10-10', '2017-10-11', '2017-10-12', '2017-10-13',
               '2017-10-16', '2017-10-17', '2017-10-18', '2017-10-19',
               '2017-10-20', '2017-10-23'],
              dtype='datetime64[ns]', length=211, freq='B')

In [12]: len(pd.bdate_range('2017-01-01', '2017-10-23'))
Out[12]: 211
Run Code Online (Sandbox Code Playgroud)