在DataFrame中获取上一个工作日

Cer*_*jos 13 python datetime calendar python-3.x pandas

我有一个包含两列的DataFrame,一个日期和一个类别.我想根据规则创建一个新的日期列:如果类别是B工作日最接近日期的值(仅来自过去或日本身),否则它是日期列本身的值.

我将工作日定义为不在周末的任何一天,也不存在于holidays下面的最小示例中定义的列表中.

请考虑以下DataFrame df:

import datetime as dt
import pandas as pd
from IPython.display import display

holidays = [dt.datetime(2018, 10, 11)]
df = pd.DataFrame({"day": ["2018-10-10", "2018-10-11", "2018-10-12",
                       "2018-10-13", "2018-10-14", "2018-10-15"
                      ],
               "category":["A", "B", "C", "B", "C", "A"]
              }
)

df["day"] = pd.to_datetime(df.day, format="%Y-%m-%d")
display(df)

         day category
0 2018-10-10        A
1 2018-10-11        B
2 2018-10-12        C
3 2018-10-13        B
4 2018-10-14        C
5 2018-10-15        A
Run Code Online (Sandbox Code Playgroud)

如何获得第三列,其值如下所示?

2018-10-10
2018-10-10
2018-10-12
2018-10-12
2018-10-14
2018-10-15
Run Code Online (Sandbox Code Playgroud)

我有一个创建的函数,可以查找使用列表的最后一个工作日,如果有任何帮助的话.

# creates a list whose elements are all days in the years 2017, 2018 and 2019
days = [dt.datetime(2017, 1 , 1) + dt.timedelta(k) for k in range(365*3)]


def lastt_bus_day(date):
    return max(
        [d for d in days if d.weekday() not in [5, 6]
                            and d not in holidays
                            and d <= date
        ]
    )

for d in df.day:
    print(last_bus_day(d))
#prints
2018-10-10 00:00:00
2018-10-10 00:00:00
2018-10-12 00:00:00
2018-10-12 00:00:00
2018-10-12 00:00:00
2018-10-15 00:00:00
Run Code Online (Sandbox Code Playgroud)

jpp*_*jpp 3

Pandas 支持通过自定义工作日提供您自己的假期。

该解决方案的好处是无缝支持相邻的假期;例如,某些地区的节礼日和圣诞节。

# define custom business days
weekmask = 'Mon Tue Wed Thu Fri'
holidays = ['2018-10-11']

bday = pd.tseries.offsets.CustomBusinessDay(holidays=holidays, weekmask=weekmask)

# construct mask to identify when days must be sutracted
m1 = df['category'] == 'B'
m2 = df['day'].dt.weekday.isin([5, 6]) | df['day'].isin(holidays)

# apply conditional logic
df['day'] = np.where(m1 & m2, df['day'] - bday, df['day'])

print(df)

  category        day
0        A 2018-10-10
1        B 2018-10-10
2        C 2018-10-12
3        B 2018-10-12
4        C 2018-10-14
5        A 2018-10-15
Run Code Online (Sandbox Code Playgroud)

编辑:根据您的评论,“我刚刚意识到我没有确切地询问我想要什么。我想找到前一个工作日”,您可以简单地使用:

df['day'] -= bday
Run Code Online (Sandbox Code Playgroud)