我有一个由日期列组成的数据框,但日期列是字符串。如何检查日期是在上半月还是下半月,并添加带有帐单日期的另一列
例如
如果日期是08-10-2020(格式为 dd-mm-yyyy),则该billing date列将包含同月 16 日,如果日期位于 17-31 之间,则计费日期将包含下个月的 1 日
数据:
print(df['dispatch_date'].head())
0 01-10-2020
1 07-10-2020
2 17-10-2020
3 16-10-2020
4 09-10-2020
Name: dispatch_date, dtype: object
Run Code Online (Sandbox Code Playgroud)
示例输出:
billing date
0 01-10-2020 16-10-2020
1 07-10-2020 16-10-2020
2 17-10-2020 01-11-2020
3 16-10-2020 01-11-2020
4 09-10-2020 16-10-2020
Run Code Online (Sandbox Code Playgroud)
你可以使用apply如下方法来做到这一点-
import pandas as pd
import datetime as dt
dates = ['01-10-2020', '07-10-2020', '17-10-2020', '15-12-2020', '19-12-2020']
df = pd.DataFrame(data=dates, columns=['dates'])
# if the billing data can still be string going ahead
print(df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1)))
df['billing_date'] = df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1))
# if billing date series is needed as a datetime object
print(df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1)))
df['billing_date'] = df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1))
Run Code Online (Sandbox Code Playgroud)
输出
0 16-10-2020
1 16-10-2020
2 01-11-2020
3 16-12-2020
4 01-01-2021
Name: dates, dtype: object
0 2020-10-16
1 2020-10-16
2 2020-11-01
3 2020-12-16
4 2021-01-01
Name: dates, dtype: object
Run Code Online (Sandbox Code Playgroud)
编辑:代码处理 12 月可能出现的边缘情况
| 归档时间: |
|
| 查看次数: |
1695 次 |
| 最近记录: |