bet*_*eta 15 python datetime timedelta pandas
我有一个像这样的pandas数据框:
Name start end
A 2000-01-10 1970-04-29
Run Code Online (Sandbox Code Playgroud)
我想添加一个新列,提供年份,月份,天数start与end列之间的差异.
所以结果应该是这样的:
Name start end diff
A 2000-01-10 1970-04-29 29y9m etc.
Run Code Online (Sandbox Code Playgroud)
diff列也可以是一个datetime对象或一个timedelta对象,但对我而言,关键在于,我可以轻松地从中获取年份和月份.
我到现在为止尝试的是:
df['diff'] = df['end'] - df['start']
Run Code Online (Sandbox Code Playgroud)
这导致新列包含10848 days.但是,我不知道如何将天数转换为29y9m等.
小智 18
您可以尝试以这种方式创建一个带有年份的新列:
df['diff_year'] = df['diff'] / np.timedelta64(1, 'Y')
Run Code Online (Sandbox Code Playgroud)
非常直截了当relativedelta:
from dateutil import relativedelta
>> end start
>> 0 1970-04-29 2000-01-10
for i in df.index:
df.at[i, 'diff'] = relativedelta.relativedelta(df.ix[i, 'start'], df.ix[i, 'end'])
>> end start diff
>> 0 1970-04-29 2000-01-10 relativedelta(years=+29, months=+8, days=+12)
Run Code Online (Sandbox Code Playgroud)
小智 7
我认为这是最"大熊猫"的方式,不使用任何for循环或定义外部函数:
>>> df = pd.DataFrame({'Name': ['A'], 'start': [datetime(2000, 1, 10)], 'end': [datetime(1970, 4, 29)]})
>>> df['diff'] = map(lambda td: datetime(1, 1, 1) + td, list(df['start'] - df['end']))
>>> df['diff'] = df['diff'].apply(lambda d: '{0}y{1}m'.format(d.year - 1, d.month - 1))
>>> df
Name end start diff
0 A 1970-04-29 2000-01-10 29y8m
Run Code Online (Sandbox Code Playgroud)
由于pandas的timedelda64,它不允许使用map而不是apply,因为它不允许对datetime对象进行简单的添加.
通过简单的功能,您可以实现目标.
该函数通过简单的计算计算年份差异和月份差异.
import pandas as pd
import datetime
def parse_date(td):
resYear = float(td.days)/364.0 # get the number of years including the the numbers after the dot
resMonth = int((resYear - int(resYear))*364/30) # get the number of months, by multiply the number after the dot by 364 and divide by 30.
resYear = int(resYear)
return str(resYear) + "Y" + str(resMonth) + "m"
df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
print df
start end delta
0 2000-01-10 1970-04-29 29Y9m
Run Code Online (Sandbox Code Playgroud)
一种简单得多的方法是使用date_range函数并计算相同的长度
startdt=pd.to_datetime('2017-01-01')
enddt = pd.to_datetime('2018-01-01')
len(pd.date_range(start=startdt,end=enddt,freq='M'))
Run Code Online (Sandbox Code Playgroud)