我正在尝试合并(Pandas 14.1)数据帧和一系列.该系列应该与一些NA形成一个新列(因为该系列的索引值是数据帧的索引值的子集).
这适用于玩具示例,但不适用于我的数据(详见下文).
例:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B', 'C', 'D'], index=pd.date_range('1/1/2011', periods=6, freq='D'))
df1
A B C D
2011-01-01 -0.487926 0.439190 0.194810 0.333896
2011-01-02 1.708024 0.237587 -0.958100 1.418285
2011-01-03 -1.228805 1.266068 -1.755050 -1.476395
2011-01-04 -0.554705 1.342504 0.245934 0.955521
2011-01-05 -0.351260 -0.798270 0.820535 -0.597322
2011-01-06 0.132924 0.501027 -1.139487 1.107873
s1 = pd.Series(np.random.randn(3), name='foo', index=pd.date_range('1/1/2011', periods=3, freq='2D'))
s1
2011-01-01 -1.660578
2011-01-03 -0.209688
2011-01-05 0.546146
Freq: 2D, Name: foo, dtype: float64
pd.concat([df1, s1],axis=1) …Run Code Online (Sandbox Code Playgroud) 通过Anaconda上的一个(极少数可用的)教程,我试过:
$ conda create -n rootclone --clone root
Run Code Online (Sandbox Code Playgroud)
这失败了:
src_prefix: '/home/bir/conda'
dst_prefix: '/home/bir/conda/envs/rootclone'
Packages: 49
Files: 471
An unexpected error has occurred, please consider sending the
following traceback to the conda GitHub issue tracker at:
https://github.com/conda/conda/issues
Include the output of the command 'conda info' in your report.
Traceback (most recent call last):
File "/home/bir/conda/bin/conda", line 5, in <module>
sys.exit(main())
File "/home/bir/conda/lib/python2.7/site-packages/conda/cli/main.py", line 203, in main
args_func(args, p)
File "/home/bir/conda/lib/python2.7/site-packages/conda/cli/main.py", line 208, in args_func
args.func(args, p)
File …Run Code Online (Sandbox Code Playgroud) 我正在尝试处理这样的CSV文件:
df = pd.read_csv("raw_hl.csv", index_col='time', parse_dates = True))
df.head(2)
high low
time
2014-01-01 17:00:00 1.376235 1.375945
2014-01-01 17:01:00 1.376005 1.375775
2014-01-01 17:02:00 1.375795 1.375445
2014-01-01 17:07:00 NaN NaN
...
2014-01-01 17:49:00 1.375645 1.375445
type(df.index)
pandas.tseries.index.DatetimeIndex
Run Code Online (Sandbox Code Playgroud)
但这些并不会自动产生频率:
print df.index.freq
None
Run Code Online (Sandbox Code Playgroud)
如果它们具有不同的频率,则能够自动设置一个是很方便的.最简单的方法是比较前两行:
tdelta = df.index[1] - df.index[0]
tdelta
datetime.timedelta(0, 60)
Run Code Online (Sandbox Code Playgroud)
到目前为止一直很好,但直接设置频率到这个时间点失败:
df.index.freq = tdelta
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-3f24abacf9de> in <module>()
----> 1 df.index.freq = tdelta
AttributeError: can't set attribute
Run Code Online (Sandbox Code Playgroud)
有没有办法(理想情况下相对无痛!)这样做?
答案:Pandas已经给出了dataframe有一个index.inferred_freq属性 - 可能是为了避免覆盖用户定义的频率.df.index.inferred_freq ='T'
所以它似乎只是使用它而不是df.index.freq.感谢Jeff,他还提供了更多详细信息:)
我设法使用以下方法:
dft = pd.DataFrame.from_dict({
0: [50, 45, 00, 00],
1: [53, 48, 00, 00],
2: [56, 53, 00, 00],
3: [54, 49, 00, 00],
4: [53, 48, 00, 00],
5: [50, 45, 00, 00]
}, orient='index'
)
Run Code Online (Sandbox Code Playgroud)
完成后,构造函数看起来就像DataFrame一样,易于阅读/编辑:
>>> dft
0 1 2 3
0 50 45 0 0
1 53 48 0 0
2 56 53 0 0
3 54 49 0 0
4 53 48 0 0
5 50 45 0 0
Run Code Online (Sandbox Code Playgroud)
但DataFrame.from_dict构造函数没有columns参数,因此为列提供合理的名称需要额外的步骤:
dft.columns = ['A', …Run Code Online (Sandbox Code Playgroud) 一些Matplotlib方法需要几天"浮动日格式".datestr2num是一个转换器函数,但它与相关的pandas对象有关:
In [3]: type(df.index)
Out[3]: pandas.tseries.index.DatetimeIndex
In [4]: type(df.index[0])
Out[4]: pandas.tslib.Timestamp
In [5]: mpl.dates.date2num(df.index)
Out [5]: ...
AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'
Run Code Online (Sandbox Code Playgroud)
这提供了'浮动日格式'的可用时间列表:
dates = [mpl.dates.date2num(t) for t in df.index]
Run Code Online (Sandbox Code Playgroud)
但有更好的方法吗?