Dav*_*gan 4 python matplotlib pandas
虽然在熊猫中绘制groupby对象是直截了当且容易的,但我想知道从groupby对象中获取唯一组的最pythonic(pandastic?)方法是什么.例如:我正在处理大气数据,并尝试绘制几天或更长时间内的昼夜趋势.以下是包含许多天数据的DataFrame,其中时间戳是索引:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10909 entries, 2013-08-04 12:01:00 to 2013-08-13 17:43:00
Data columns (total 17 columns):
Date 10909 non-null values
Flags 10909 non-null values
Time 10909 non-null values
convt 10909 non-null values
hino 10909 non-null values
hinox 10909 non-null values
intt 10909 non-null values
no 10909 non-null values
nox 10909 non-null values
ozonf 10909 non-null values
pmtt 10909 non-null values
pmtv 10909 non-null values
pres 10909 non-null values
rctt 10909 non-null values
smplf 10909 non-null values
stamp 10909 non-null values
no2 10909 non-null values
dtypes: datetime64[ns](1), float64(11), int64(2), object(3)
Run Code Online (Sandbox Code Playgroud)
为了能够在几分钟内对每分钟的数据进行平均(并采用其他统计数据),我将数据帧分组:
data = no.groupby('Time')
然后,我可以轻松绘制平均NO浓度以及四分位数:
ax = figure(figsize=(12,8)).add_subplot(111)
title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
ylabel('Concentration [ppb]')
data.no.mean().plot(ax=ax, style='b', label='Mean')
data.no.apply(lambda x: percentile(x, 25)).plot(ax=ax, style='r', label='25%')
data.no.apply(lambda x: percentile(x, 75)).plot(ax=ax, style='r', label='75%')
Run Code Online (Sandbox Code Playgroud)
引发我的问题的问题是,为了绘制更有趣的外观,如使用的情节fill_between()
,有必要知道每个文档的x轴信息
fill_between(x, y1, y2=0, where=None, interpolate=False, hold=None, **kwargs)
Run Code Online (Sandbox Code Playgroud)
对于我的生活,我无法找到实现这一目标的最佳方法.我试过了:
我可以做这些工作,但我知道有更好的方法.Python太漂亮了.有什么想法/提示吗?
更新:
可以使用unstack()
诸如的统计信息将统计信息转储到新的数据框中
no_new = no.groupby('Time')['no'].describe().unstack()
no_new.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1440 entries, 00:00 to 23:59
Data columns (total 8 columns):
count 1440 non-null values
mean 1440 non-null values
std 1440 non-null values
min 1440 non-null values
25% 1440 non-null values
50% 1440 non-null values
75% 1440 non-null values
max 1440 non-null values
dtypes: float64(8)
Run Code Online (Sandbox Code Playgroud)
虽然我应该能够fill_between()
使用no_new.index
,我收到一个TypeError
.
当前的绘图代码和TypeError
:
ax = figure(figzise=(12,8)).add_subplot(111)
ax.plot(no_new['mean'])
ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')
Run Code Online (Sandbox Code Playgroud)
类型错误:
TypeError Traceback (most recent call last)
<ipython-input-6-47493de920f1> in <module>()
2 ax = figure(figsize=(12,8)).add_subplot(111)
3 ax.plot(no_new['mean'])
----> 4 ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')
5 #title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
6 #ylabel('Concentration [ppb]')
C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes.pyc in fill_between(self, x, y1, y2, where, interpolate, **kwargs)
6986
6987 # Convert the arrays so we can work with them
-> 6988 x = ma.masked_invalid(self.convert_xunits(x))
6989 y1 = ma.masked_invalid(self.convert_yunits(y1))
6990 y2 = ma.masked_invalid(self.convert_yunits(y2))
C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\ma\core.pyc in masked_invalid(a, copy)
2237 cls = type(a)
2238 else:
-> 2239 condition = ~(np.isfinite(a))
2240 cls = MaskedArray
2241 result = a.view(cls)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Run Code Online (Sandbox Code Playgroud)
截至目前的情节如下:
将groupby stats(mean/25/75)存储为新数据帧中的列,然后将新数据帧index
作为我x
的plt.fill_between()
工作参数传递(使用matplotlib 1.3.1测试).例如,
gdf = df.groupby('Time')[col].describe().unstack()
plt.fill_between(gdf.index, gdf['25%'], gdf['75%'], alpha=.5)
Run Code Online (Sandbox Code Playgroud)
gdf.info()
应该是这样的:
<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, 00:00:00 to 22:00:00
Data columns (total 8 columns):
count 12 non-null float64
mean 12 non-null float64
std 12 non-null float64
min 12 non-null float64
25% 12 non-null float64
50% 12 non-null float64
75% 12 non-null float64
max 12 non-null float64
dtypes: float64(8)
Run Code Online (Sandbox Code Playgroud)
更新:要解决TypeError: ufunc 'isfinite' not supported
异常,必须首先将Time
列从"HH:MM"格式的一系列字符串对象转换为一系列datetime.time
对象,这可以按如下方式完成:
df['Time'] = df.Time.map(lambda x: pd.datetools.parse(x).time())
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
5642 次 |
最近记录: |