Mat*_*000 3 python datetime pandas na
我想将日期时间列中的年份提取到新的“yyyy”列中,并且希望将缺失值 (NaT) 显示为“NaN”,因此新列的日期时间数据类型应该是我想已经改变了,但我被困住了..
\n\n初始 df:
\n\n Date ID\n0 2016-01-01 12\n1 2015-01-01 96\n2 NaT 20\n3 2018-01-01 73\n4 2017-01-01 84\n5 NaT 26\n6 2013-01-01 87\n7 2016-01-01 64\n8 2019-01-01 11\n9 2014-01-01 34\nRun Code Online (Sandbox Code Playgroud)\n\n所需的 df:
\n\n Date ID yyyy\n0 2016-01-01 12 2016\n1 2015-01-01 96 2015\n2 NaT 20 NaN\n3 2018-01-01 73 2018\n4 2017-01-01 84 2017\n5 NaT 26 NaN\n6 2013-01-01 87 2013\n7 2016-01-01 64 2016\n8 2019-01-01 11 2019\n9 2014-01-01 34 2014\nRun Code Online (Sandbox Code Playgroud)\n\n代码:
\n\nimport pandas as pd\xe2\x80\xa8\nimport numpy as np\xe2\x80\xa8\xe2\x80\xa8\n\n# example df\ndf = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34],\xe2\x80\xa8 \n "Date": [\'2016-01-01\', \'2015-01-01\', np.nan, \'2018-01-01\', \'2017-01-01\', np.nan, \'2013-01-01\', \'2016-01-01\', \'2019-01-01\', \'2014-01-01\']})\xe2\x80\xa8\xe2\x80\xa8\n\ndf.ID = pd.to_numeric(df.ID)\n\xe2\x80\xa8df.Date = pd.to_datetime(df.Date)\xe2\x80\xa8print(df)\n\n#extraction of year from date\ndf[\'yyyy\'] = pd.to_datetime(df.Date).dt.strftime(\'%Y\')\xe2\x80\xa8\xe2\x80\xa8\n\n#Try to set NaT to NaN or datetime to numeric, PROBLEM: empty cells keep \'NaT\'\ndf.loc[(df[\'yyyy\'].isna()), \'yyyy\'] = np.nan\xe2\x80\xa8\xe2\x80\xa8 #(try1)\ndf.yyyy = df.Date.astype(float)\xe2\x80\xa8 #(try2)\ndf.yyyy = pd.to_numeric(df.Date)\xe2\x80\xa8 #(try3)\n\nprint(df)\n\n\nRun Code Online (Sandbox Code Playgroud)\n
Series.dt.year与转换为整数一起使用Int64:
df.Date = pd.to_datetime(df.Date)
df['yyyy'] = df.Date.dt.year.astype('Int64')
print (df)
ID Date yyyy
0 12 2016-01-01 2016
1 96 2015-01-01 2015
2 20 NaT <NA>
3 73 2018-01-01 2018
4 84 2017-01-01 2017
5 26 NaT <NA>
6 87 2013-01-01 2013
7 64 2016-01-01 2016
8 11 2019-01-01 2019
9 34 2014-01-01 2014
Run Code Online (Sandbox Code Playgroud)
不将浮点数转换为整数:
df['yyyy'] = df.Date.dt.year
print (df)
ID Date yyyy
0 12 2016-01-01 2016.0
1 96 2015-01-01 2015.0
2 20 NaT NaN
3 73 2018-01-01 2018.0
4 84 2017-01-01 2017.0
5 26 NaT NaN
6 87 2013-01-01 2013.0
7 64 2016-01-01 2016.0
8 11 2019-01-01 2019.0
9 34 2014-01-01 2014.0
Run Code Online (Sandbox Code Playgroud)
您的解决方案转换NaT为字符串NaT,因此可以使用replace。顺便说一句,在 pandas 的最新版本中replace是不必要的,它可以正常工作。
df['yyyy'] = pd.to_datetime(df.Date).dt.strftime('%Y').replace('NaT', np.nan)
Run Code Online (Sandbox Code Playgroud)