Pandas和Matplotlib - fill_between()vs datetime64

chi*_*liq 34 python matplotlib pandas

有一个Pandas DataFrame:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 300 entries, 5220 to 5519
Data columns (total 3 columns):
Date             300 non-null datetime64[ns]
A                300 non-null float64
B                300 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 30.5 KB
Run Code Online (Sandbox Code Playgroud)

我想绘制A和B系列与日期的关系.

plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')
Run Code Online (Sandbox Code Playgroud)

然后我想在A和B系列之间的区域上应用fill_between():

plt.fill_between(data['Date'], data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)
Run Code Online (Sandbox Code Playgroud)

哪个输出:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the casting 
rule ''safe''
Run Code Online (Sandbox Code Playgroud)

matplotlib是否在fill_between()函数中接受pandas datetime64对象?我应该将其转换为不同的日期类型吗?

unu*_*tbu 26

Pandas注册了一个转换器,matplotlib.units.registry其中将许多日期时间类型(例如pandas DatetimeIndex和datetime64dtype的numpy数组)转换为matplotlib datenums,但它不处理Series带有dtype的Pandas datetime64.

In [67]: import pandas.tseries.converter as converter

In [68]: c = converter.DatetimeConverter()

In [69]: type(c.convert(df['Date'].values, None, None))
Out[69]: numpy.ndarray              # converted (good)

In [70]: type(c.convert(df['Date'], None, None))
Out[70]: pandas.core.series.Series  # left unchanged
Run Code Online (Sandbox Code Playgroud)

fill_between 检查并使用转换器处理数据(如果存在).

因此,作为一种解决方法,您可以将日期转换为datetime64's 的NumPy数组:

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)
Run Code Online (Sandbox Code Playgroud)

例如,

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='D')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
               'Date': dates})
plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

  • @chilliq:我的第一个建议是使用DatetimeIndex.事实证明这不是必要的.简单地使用`data ['Date'].values`从Pandas系列中提取底层NumPy数组更快. (3认同)

Tur*_*opy 5

WillZ指出,Pandas 0.21破坏了unutbu的解决方案.但是,将日期时间转换为日期会对数据分析产生显着的负面影响.此解决方案目前有效并保持日期时间:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
d = data['Date'].dt.to_pydatetime()
plt.plot_date(d, data['A'], '-')
plt.plot_date(d, data['B'], '-')


plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()
Run Code Online (Sandbox Code Playgroud)

fill_between与datetime64约束

编辑:根据jedi的评论,我开始确定以下三个选项中最快的方法:

  • method1 =原始答案
  • method2 = jedi的评论+原始答案
  • method3 = jedi的评论

method2略快,但更加一致,因此我编辑了上面的答案以反映最佳方法.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time


N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
time_data = pd.DataFrame(columns=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'])
method1 = []
method2 = []
method3 = []
for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method1.append(time.clock() - start)

for i  in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method2.append(time.clock() - start)

for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['A'], '-')
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['B'], '-')


        plt.fill_between(data['Date'].dt.to_pydatetime(), data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method3.append(time.clock() - start)

time_data.loc['method1'] = method1
time_data.loc['method2'] = method2
time_data.loc['method3'] = method3
print(time_data)
plt.errorbar(time_data.index, time_data.mean(axis=1), yerr=time_data.std(axis=1))
Run Code Online (Sandbox Code Playgroud)

时间测试3种方法转换时间数据绘制DataFrame