我接受了我的系列并将其强制转换为dtype =的日期时间列datetime64[ns]
(虽然只需要一天的分辨率......不确定如何更改).
import pandas as pd
df = pd.read_csv('somefile.csv')
column = df['date']
column = pd.to_datetime(column, coerce=True)
Run Code Online (Sandbox Code Playgroud)
但绘图不起作用:
ipdb> column.plot(kind='hist')
*** TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('float64')
Run Code Online (Sandbox Code Playgroud)
我想绘制一个直方图,只显示按周,月或年的日期计数.
当然有办法做到这一点pandas
?
我有DataFrame
几个时间序列:
divida movav12 var varmovav12
Date
2004-01 0 NaN NaN NaN
2004-02 0 NaN NaN NaN
2004-03 0 NaN NaN NaN
2004-04 34 NaN inf NaN
2004-05 30 NaN -0.117647 NaN
2004-06 44 NaN 0.466667 NaN
2004-07 35 NaN -0.204545 NaN
2004-08 31 NaN -0.114286 NaN
2004-09 30 NaN -0.032258 NaN
2004-10 24 NaN -0.200000 NaN
2004-11 41 NaN 0.708333 NaN
2004-12 29 24.833333 -0.292683 NaN
2005-01 31 27.416667 0.068966 0.104027
2005-02 28 29.750000 -0.096774 0.085106
2005-03 27 …
Run Code Online (Sandbox Code Playgroud) 假设有以下内容DataFrame
:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
np.random.seed(10)
n = 10
df = pd.DataFrame(
{
"datetime": np.random.choice(rng,n),
"cat": np.random.choice(['a','b','b'], n),
"val": np.random.randint(0,5, size=n)
}
)
Run Code Online (Sandbox Code Playgroud)
如果我现在groupby
:
gb = df.groupby(['cat','datetime']).sum()
Run Code Online (Sandbox Code Playgroud)
我得到cat
每小时的总数:
cat datetime val
a 2011-01-01 00:00:00 1
2011-01-01 09:00:00 3
2011-01-02 16:00:00 1
2011-01-03 16:00:00 1
b 2011-01-01 08:00:00 4
2011-01-01 15:00:00 3
2011-01-01 16:00:00 3
2011-01-02 04:00:00 4
2011-01-02 05:00:00 1
2011-01-02 12:00:00 4
Run Code Online (Sandbox Code Playgroud)
但是,我希望有类似的东西:
cat datetime val
a 2011-01-01 4
2011-01-02 1
2011-01-03 …
Run Code Online (Sandbox Code Playgroud) 说我有以下值:
money_spent
time
2014-10-06 17:59:40.016000-04:00 1.832128
2014-10-06 17:59:41.771000-04:00 2.671048
2014-10-06 17:59:43.001000-04:00 2.019434
2014-10-06 17:59:44.792000-04:00 1.294051
2014-10-06 17:59:48.741000-04:00 0.867856
Run Code Online (Sandbox Code Playgroud)
我希望能衡量每秒钟花费的金钱2
。更具体地说,对于输出中的每个时间戳,我需要查看最近2秒钟内花费的资金。
当我做:
df.resample('2S', how='last')
Run Code Online (Sandbox Code Playgroud)
我得到:
money_spent
time
2014-10-06 17:59:40-04:00 2.671048
2014-10-06 17:59:42-04:00 2.019434
2014-10-06 17:59:44-04:00 1.294051
2014-10-06 17:59:46-04:00 NaN
2014-10-06 17:59:48-04:00 0.867856
Run Code Online (Sandbox Code Playgroud)
这不是我所期望的。首先,请注意,再采样DF的第一项是2.671048
,但那是在时间17:59:40
,即使按照原来的数据帧,没钱花了还没有。那怎么可能?
如何从这个csv中获取使用Python/pandas的5分钟数据?每隔5分钟我会尝试在5分钟的间隔内获得DATE,TIME,OPEN,HIGH,LOW,CLOSE,VOLUME.
DATE TIME OPEN HIGH LOW CLOSE VOLUME
02/03/1997 09:04:00 3046.00 3048.50 3046.00 3047.50 505
02/03/1997 09:05:00 3047.00 3048.00 3046.00 3047.00 162
02/03/1997 09:06:00 3047.50 3048.00 3047.00 3047.50 98
02/03/1997 09:07:00 3047.50 3047.50 3047.00 3047.50 228
02/03/1997 09:08:00 3048.00 3048.00 3047.50 3048.00 136
02/03/1997 09:09:00 3048.00 3048.00 3046.50 3046.50 174
02/03/1997 09:10:00 3046.50 3046.50 3045.00 3045.00 134
02/03/1997 09:11:00 3045.50 3046.00 3044.00 3045.00 43
02/03/1997 09:12:00 3045.00 3045.50 3045.00 3045.00 214
02/03/1997 09:13:00 3045.50 3045.50 3045.50 3045.50 8
02/03/1997 09:14:00 …
Run Code Online (Sandbox Code Playgroud) pandas ×5
python ×5
time-series ×2
dataframe ×1
datetime ×1
matplotlib ×1
python-2.7 ×1
statsmodels ×1