从熊猫时间戳获取MM-DD-YYYY

bla*_*bul 17 python date pandas

在python中,日期似乎是一个棘手的事情,我在简单地从大熊猫TimeStamp中删除日期时遇到了很多麻烦.我想2013-09-29 02:34:44简单地说09-29-2013

我有一个包含Created_date列的数据框:

Name: Created_Date, Length: 1162549, dtype: datetime64[ns]`
Run Code Online (Sandbox Code Playgroud)

我已经尝试.date()在这个系列上应用这个方法,例如:df.Created_Date.date()但是我得到了错误AttributeError: 'Series' object has no attribute 'date'

有人可以帮我吗?

Phi*_*oud 32

map 在元素上:

In [239]: from operator import methodcaller

In [240]: s = Series(date_range(Timestamp('now'), periods=2))

In [241]: s
Out[241]:
0   2013-10-01 00:24:16
1   2013-10-02 00:24:16
dtype: datetime64[ns]

In [238]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[238]:
0    01-10-2013
1    02-10-2013
dtype: object

In [242]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[242]:
0    01-10-2013
1    02-10-2013
dtype: object
Run Code Online (Sandbox Code Playgroud)

您可以datetime.date通过调用date()组成以下Timestamp元素的元素的方法来获取原始对象Series:

In [249]: s.map(methodcaller('date'))

Out[249]:
0    2013-10-01
1    2013-10-02
dtype: object

In [250]: s.map(methodcaller('date')).values

Out[250]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
Run Code Online (Sandbox Code Playgroud)

然而,另一种方式,你可以做到这一点是通过调用未绑定Timestamp.date方法:

In [273]: s.map(Timestamp.date)
Out[273]:
0    2013-10-01
1    2013-10-02
dtype: object
Run Code Online (Sandbox Code Playgroud)

这种方法是最快的,而恕我直言最具可读性.Timestamp可以在顶级pandas模块中访问,如下所示:pandas.Timestamp.我已将其直接导入以用于说明目的.

对象的date属性DatetimeIndex执行类似的操作,但返回一个numpy对象数组:

In [243]: index = DatetimeIndex(s)

In [244]: index
Out[244]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-10-01 00:24:16, 2013-10-02 00:24:16]
Length: 2, Freq: None, Timezone: None

In [246]: index.date
Out[246]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
Run Code Online (Sandbox Code Playgroud)

对于较大的datetime64[ns] Series对象,调用Timestamp.date速度比以下operator.methodcaller速度快一些lambda:

In [263]: f = methodcaller('date')

In [264]: flam = lambda x: x.date()

In [265]: fmeth = Timestamp.date

In [266]: s2 = Series(date_range('20010101', periods=1000000, freq='T'))

In [267]: s2
Out[267]:
0    2001-01-01 00:00:00
1    2001-01-01 00:01:00
2    2001-01-01 00:02:00
3    2001-01-01 00:03:00
4    2001-01-01 00:04:00
5    2001-01-01 00:05:00
6    2001-01-01 00:06:00
7    2001-01-01 00:07:00
8    2001-01-01 00:08:00
9    2001-01-01 00:09:00
10   2001-01-01 00:10:00
11   2001-01-01 00:11:00
12   2001-01-01 00:12:00
13   2001-01-01 00:13:00
14   2001-01-01 00:14:00
...
999985   2002-11-26 10:25:00
999986   2002-11-26 10:26:00
999987   2002-11-26 10:27:00
999988   2002-11-26 10:28:00
999989   2002-11-26 10:29:00
999990   2002-11-26 10:30:00
999991   2002-11-26 10:31:00
999992   2002-11-26 10:32:00
999993   2002-11-26 10:33:00
999994   2002-11-26 10:34:00
999995   2002-11-26 10:35:00
999996   2002-11-26 10:36:00
999997   2002-11-26 10:37:00
999998   2002-11-26 10:38:00
999999   2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]

In [269]: timeit s2.map(f)
1 loops, best of 3: 1.04 s per loop

In [270]: timeit s2.map(flam)
1 loops, best of 3: 1.1 s per loop

In [271]: timeit s2.map(fmeth)
1 loops, best of 3: 968 ms per loop
Run Code Online (Sandbox Code Playgroud)

请记住,其中一个目标pandas是提供一个层,numpy以便(大多数时候)您不必处理低级细节ndarray.因此,datetime.date在数组中获取原始对象的用途有限,因为它们不对应于任何numpy.dtype支持的pandas(pandas仅支持datetime64[ns][那是纳秒] dtypes).也就是说,有时您需要这样做.