Ann*_*s15 3 python datetime pandas
你好世界,
我想检索每个月的公共假期数。
这是我的数据集
City date value End_date
BE 01/01/16 41 31/01/16
NW 01/10/16 74 31/10/16
BY 01/05/16 97 31/05/16
Run Code Online (Sandbox Code Playgroud)
通过以下代码,我可以手动知道当天是否是公共假期:
from datetime import date
import holidays
#prov = BW, BY, BE, BB, HB, HH, HE, MV, NI, NW, RP, SL, SN, ST, SH, TH
us_holidays = holidays.CountryHoliday('DE', prov='NW', state=None )
date(2020, 5, 21) in us_holidays
out:
False
Run Code Online (Sandbox Code Playgroud)
问题: 如何计算每个月“真实”值的数量?如何在数据框中存储“True”值的计数?
预期产出
City date value End_date Nb_pub_holiday
BE 01/01/16 41 31/01/16 2
NW 01/10/16 74 31/10/16 0
BY 01/05/16 97 31/05/16 4
Run Code Online (Sandbox Code Playgroud)
不知道为什么,但我在自定义函数中得到不同的输出,并date_range在生成器中对匹配值进行计数sum:
#convert columns to datetimes
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%y')
df['End_date'] = pd.to_datetime(df['End_date'], format='%d/%m/%y')
import holidays
def f1(x):
h = holidays.CountryHoliday('DE', prov=x['City'], state=None)
d = pd.date_range(x['date'], x['End_date'])
return sum(y in h for y in d)
df['Nb_pub_holiday'] = df.apply(f1, axis=1)
print (df)
City date value End_date Nb_pub_holiday
0 BE 2016-01-01 41 2016-01-31 1
1 NW 2016-10-01 74 2016-10-31 1
2 BY 2016-05-01 97 2016-05-31 4
Run Code Online (Sandbox Code Playgroud)
对于假期日期列表,可以使用:
def f2(x):
h = holidays.CountryHoliday('DE', prov=x['City'], state=None)
d = pd.date_range(x['date'], x['End_date'])
return [y.date() for y in d if y in h]
df['Lst_pub_holiday'] = df.apply(f2, axis=1)
print (df)
City date value End_date \
0 BE 2016-01-01 41 2016-01-31
1 NW 2016-10-01 74 2016-10-31
2 BY 2016-05-01 97 2016-05-31
Lst_pub_holiday
0 [2016-01-01]
1 [2016-10-03]
2 [2016-05-01, 2016-05-05, 2016-05-16, 2016-05-26]
Run Code Online (Sandbox Code Playgroud)