我有一个csv文件,pandas.read_csv当我使用过滤列usecols并使用多个索引时,它没有正确使用.
import pandas as pd
csv = r"""dummy,date,loc,x
bar,20090101,a,1
bar,20090102,a,3
bar,20090103,a,5
bar,20090101,b,1
bar,20090102,b,3
bar,20090103,b,5"""
f = open('foo.csv', 'w')
f.write(csv)
f.close()
df1 = pd.read_csv('foo.csv',
header=0,
names=["dummy", "date", "loc", "x"],
index_col=["date", "loc"],
usecols=["dummy", "date", "loc", "x"],
parse_dates=["date"])
print df1
# Ignore the dummy columns
df2 = pd.read_csv('foo.csv',
index_col=["date", "loc"],
usecols=["date", "loc", "x"], # <----------- Changed
parse_dates=["date"],
header=0,
names=["dummy", "date", "loc", "x"])
print df2
Run Code Online (Sandbox Code Playgroud)
我希望df1和df2应该是相同的,除了丢失的虚拟列,但列标记错误.此日期也被解析为日期.
In [118]: %run test.py
dummy x
date loc
2009-01-01 a bar 1
2009-01-02 a …Run Code Online (Sandbox Code Playgroud) 我试图从用熊猫创建的时间序列图中获取绘图的xlimits作为python datetime对象.使用ax.get_xlim()将轴限制返回为a numpy.float64,我无法弄清楚如何将数字转换为可用的日期时间.
import pandas
from matplotlib import dates
import matplotlib.pyplot as plt
from datetime import datetime
from numpy.random import randn
ts = pandas.Series(randn(10000), index=pandas.date_range('1/1/2000',
periods=10000, freq='H'))
ts.plot()
ax = plt.gca()
ax.set_xlim(datetime(2000,1,1))
d1, d2 = ax.get_xlim()
print "%s(%s) to %s(%s)" % (d1, type(d1), d2, type(d2))
print "Using matplotlib: %s" % dates.num2date(d1)
print "Using datetime: %s" % datetime.fromtimestamp(d1)
Run Code Online (Sandbox Code Playgroud)
返回:
262968.0 (<type 'numpy.float64'>) to 272967.0 (<type 'numpy.float64'>)
Using matplotlib: 0720-12-25 00:00:00+00:00
Using datetime: 1970-01-03 19:02:48
Run Code Online (Sandbox Code Playgroud)
根据pandas timeseries文档,pandas使用numpy.datetime64 …
我有一个MultiIndex DataFrame,我在其上选择有趣的横截面.代码有效,但在大型数据集上运行缓慢,这让我觉得我做错了.基本上我已经将多个横截面连接成一个新的DataFrame,我正在寻找一种更好的方法.
import pandas as pd
import numpy as np
import itertools
# setup dataset
event = ['event0', 'event1', 'event2']
node = ['n0', 'n1', 'n2', 'n3']
config = ['a', 'b']
data = []
for x in itertools.product(*[event, node, config]):
data.append([x[0], x[1], x[2], np.random.randn()])
df = pd.DataFrame(data, columns=['event', 'node', 'config', 'value'])
dfi = df.set_index(['event', 'node'])
print dfi.head(n=12)
Run Code Online (Sandbox Code Playgroud)
看起来像:
config value
event node
event0 n0 a 1.256259
n0 b 0.612465
n1 a 1.593518
n1 b -0.747131
n2 a 0.719973
n2 b …Run Code Online (Sandbox Code Playgroud)