如何计算人的年龄(基于dob列)并使用新值向数据框添加列?
dataframe如下所示:
lname fname dob
0 DOE LAURIE 03011979
1 BOURNE JASON 06111978
2 GRINCH XMAS 12131988
3 DOE JOHN 11121986
Run Code Online (Sandbox Code Playgroud)
我尝试过以下操作:
now = datetime.now()
df1['age'] = now - df1['dob']
Run Code Online (Sandbox Code Playgroud)
但是,收到以下错误:
TypeError:不支持的操作数类型 - :'datetime.datetime'和'str'
unu*_*tbu 29
import datetime as DT
import io
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = 'warn'
content = ''' ssno lname fname pos_title ser gender dob
0 23456789 PLILEY JODY BUDG ANAL 0560 F 031871
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F 120852
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F 010999
3 345678912 MANNING CYNTHIA SOC SCNTST 0101 F 081692
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 0326 F 031387'''
df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)
now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y') # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] - np.timedelta64(100, 'Y')) # 2
df['age'] = (now - df['dob']).astype('<m8[Y]') # 3
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
ssno lname fname pos_title ser gender \
0 23456789 PLILEY JODY BUDG ANAL 560 F
1 987654321 NOEL HEATHER PRTG SRVCS SPECLST 1654 F
2 234567891 SONJU LAURIE SUPVY CONTR SPECLST 1102 F
3 345678912 MANNING CYNTHIA SOC SCNTST 101 F
4 456789123 NAUERTZ ELIZABETH OFF AUTOMATION ASST 326 F
dob age
0 1971-03-18 00:00:00 43
1 1952-12-08 18:00:00 61
2 1999-01-09 00:00:00 15
3 1992-08-16 00:00:00 22
4 1987-03-13 00:00:00 27
Run Code Online (Sandbox Code Playgroud)
dob列目前是字符串.首先,将它们转换为Timestamps使用pd.to_datetime.'%m%d%y'的最后两个数字转换为多年,但不幸的是假定52手段2052自那可能不是希瑟诺埃尔的birthyear,让我们减去100年dob
只要dob是大于now.您可能希望now在这种情况下减去几年,df['dob'] < now因为与一名1岁的工人相比,他可能稍微有一个101岁的工人......dob从now获得timedelta64 [NS] .要将其转换为年,请使用astype('<m8[Y]')或astype('timedelta64[Y]').# Data setup
df
lname fname dob
0 DOE LAURIE 1979-03-01
1 BOURNE JASON 1978-06-11
2 GRINCH XMAS 1988-12-13
3 DOE JOHN 1986-11-12
# Make sure to parse all datetime columns in advance
df['dob'] = pd.to_datetime(df['dob'], errors='coerce')
Run Code Online (Sandbox Code Playgroud)
如果您只想要年龄的年份部分,请使用@unutbu 的解决方案。. .
now = pd.to_datetime('now')
now
# Timestamp('2019-04-14 00:00:43.105892')
(now - df['dob']).astype('<m8[Y]')
0 40.0
1 40.0
2 30.0
3 32.0
Name: dob, dtype: float64
Run Code Online (Sandbox Code Playgroud)
另一种选择是减去年份部分并使用
(now.year - df['dob'].dt.year) - ((now.month - df['dob'].dt.month) < 0)
0 40
1 40
2 30
3 32
Name: dob, dtype: int64
Run Code Online (Sandbox Code Playgroud)
如果您想要(几乎)精确的年龄(包括小数部分),请查询total_seconds并除以。
(now - df['dob']).dt.total_seconds() / (60*60*24*365.25)
0 40.120446
1 40.840501
2 30.332630
3 32.418872
Name: dob, dtype: float64
Run Code Online (Sandbox Code Playgroud)
小智 5
我找到了更简单的解决方案:
import pandas as pd
from datetime import datetime
from datetime import date
d = {'col0': [1, 2, 6], 'col1': [3, 8, 3], 'col2': ['17.02.1979',
'11.11.1993',
'01.08.1961']}
df = pd.DataFrame(data=d)
def calculate_age(born):
born = datetime.strptime(born, "%d.%m.%Y").date()
today = date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
df['age'] = df['col6'].apply(calculate_age)
print(df)
Run Code Online (Sandbox Code Playgroud)
输出:
col0 col1 col3 age
0 1 3 17.02.1979 39
1 2 8 11.11.1993 24
2 6 3 01.08.1961 57
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21170 次 |
| 最近记录: |