从 pandas 日期时间列中提取年份作为数值,对于空单元格使用 NaN 而不是 NaT

Mat*_*000 3 python datetime pandas na

我想将日期时间列中的年份提取到新的“yyyy”列中,并且希望将缺失值 (NaT) 显示为“NaN”,因此新列的日期时间数据类型应该是我想已经改变了,但我被困住了..

\n\n

初始 df:

\n\n
        Date  ID\n0 2016-01-01  12\n1 2015-01-01  96\n2        NaT  20\n3 2018-01-01  73\n4 2017-01-01  84\n5        NaT  26\n6 2013-01-01  87\n7 2016-01-01  64\n8 2019-01-01  11\n9 2014-01-01  34\n
Run Code Online (Sandbox Code Playgroud)\n\n

所需的 df:

\n\n
        Date  ID  yyyy\n0 2016-01-01  12  2016\n1 2015-01-01  96  2015\n2        NaT  20   NaN\n3 2018-01-01  73  2018\n4 2017-01-01  84  2017\n5        NaT  26   NaN\n6 2013-01-01  87  2013\n7 2016-01-01  64  2016\n8 2019-01-01  11  2019\n9 2014-01-01  34  2014\n
Run Code Online (Sandbox Code Playgroud)\n\n

代码:

\n\n
import pandas as pd\xe2\x80\xa8\nimport numpy as np\xe2\x80\xa8\xe2\x80\xa8\n\n# example df\ndf = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34],\xe2\x80\xa8    \n                 "Date": [\'2016-01-01\', \'2015-01-01\', np.nan, \'2018-01-01\', \'2017-01-01\', np.nan, \'2013-01-01\', \'2016-01-01\', \'2019-01-01\', \'2014-01-01\']})\xe2\x80\xa8\xe2\x80\xa8\n\ndf.ID = pd.to_numeric(df.ID)\n\xe2\x80\xa8df.Date = pd.to_datetime(df.Date)\xe2\x80\xa8print(df)\n\n#extraction of year from date\ndf[\'yyyy\'] = pd.to_datetime(df.Date).dt.strftime(\'%Y\')\xe2\x80\xa8\xe2\x80\xa8\n\n#Try to set NaT to NaN or datetime to numeric, PROBLEM: empty cells keep \'NaT\'\ndf.loc[(df[\'yyyy\'].isna()), \'yyyy\'] = np.nan\xe2\x80\xa8\xe2\x80\xa8   #(try1)\ndf.yyyy = df.Date.astype(float)\xe2\x80\xa8                #(try2)\ndf.yyyy = pd.to_numeric(df.Date)\xe2\x80\xa8               #(try3)\n\nprint(df)\n\n\n
Run Code Online (Sandbox Code Playgroud)\n

jez*_*ael 5

Series.dt.year与转换为整数一起使用Int64

df.Date = pd.to_datetime(df.Date)

df['yyyy'] = df.Date.dt.year.astype('Int64')
print (df)
   ID       Date  yyyy
0  12 2016-01-01  2016
1  96 2015-01-01  2015
2  20        NaT  <NA>
3  73 2018-01-01  2018
4  84 2017-01-01  2017
5  26        NaT  <NA>
6  87 2013-01-01  2013
7  64 2016-01-01  2016
8  11 2019-01-01  2019
9  34 2014-01-01  2014
Run Code Online (Sandbox Code Playgroud)

不将浮点数转换为整数:

df['yyyy'] = df.Date.dt.year
print (df)
   ID       Date    yyyy
0  12 2016-01-01  2016.0
1  96 2015-01-01  2015.0
2  20        NaT     NaN
3  73 2018-01-01  2018.0
4  84 2017-01-01  2017.0
5  26        NaT     NaN
6  87 2013-01-01  2013.0
7  64 2016-01-01  2016.0
8  11 2019-01-01  2019.0
9  34 2014-01-01  2014.0
Run Code Online (Sandbox Code Playgroud)

您的解决方案转换NaT为字符串NaT,因此可以使用replace。顺便说一句,在 pandas 的最新版本中replace是不必要的,它可以正常工作。

df['yyyy'] = pd.to_datetime(df.Date).dt.strftime('%Y').replace('NaT', np.nan)
Run Code Online (Sandbox Code Playgroud)