Eri*_*rin 6 protocols typing dataframe pandas
我想输入提示:pandas 数据框必须有一个日期时间索引。我希望可能有某种方法可以通过协议来做到这一点,但看起来没有。本着这样的精神:
class TSFrame(Protocol):
index: pd.DatetimeIndex
def test(df: TSFrame):
# Do stuff with df.index.methods_supported_by_dtidx_only
pass
nontsdf = pd.DataFrame()
tsdf = pd.DataFrame(index=pd.DatetimeIndex(pd.date_range("2022-01-01", "2022-01-02")))
test(nontsdf) # goal is for my interpreter to complain here
test(tsdf) # and not complain here
Run Code Online (Sandbox Code Playgroud)
相反,我的口译员在这两种情况下都抱怨。令人困惑的是,如果我在泛型类上创建类似的测试,但类型提示为 int,则两种情况都不会抱怨。
class IntWanted(Protocol):
var: int
class TestClass:
def __init__(self, var: Any) -> None:
self.var = var
def foo(a: IntWanted) -> int:
return a.var
good = TestClass(1)
bad = TestClass("x")
foo(good)
foo(bad)
Run Code Online (Sandbox Code Playgroud)
我能想到的处理这些时间序列数据帧的其他方法:
class TSFrame(Protocol):
index: pd.DatetimeIndex
def test(df: TSFrame):
# Do stuff with df.index.methods_supported_by_dtidx_only
pass
nontsdf = pd.DataFrame()
tsdf = pd.DataFrame(index=pd.DatetimeIndex(pd.date_range("2022-01-01", "2022-01-02")))
test(nontsdf) # goal is for my interpreter to complain here
test(tsdf) # and not complain here
Run Code Online (Sandbox Code Playgroud)
class IntWanted(Protocol):
var: int
class TestClass:
def __init__(self, var: Any) -> None:
self.var = var
def foo(a: IntWanted) -> int:
return a.var
good = TestClass(1)
bad = TestClass("x")
foo(good)
foo(bad)
Run Code Online (Sandbox Code Playgroud)
想法表示赞赏。
您可以使用pandera( 和pandas-stub) 来做几乎任何您想做的事情。
pip install pandera[mypy]mypy.ini文件:[mypy]
plugins = pandera.mypy
Run Code Online (Sandbox Code Playgroud)
demo.py
import pandera as pa
import pandas as pd
import numpy as np
from pandera.typing import Index, DataFrame, Series
class TSFrame(pa.DataFrameModel):
idx: Index[pa.Timestamp] = pa.Field(check_name=False)
@pa.check_types # at runtime
def test(df: DataFrame[TSFrame]): # at compile time
pass
nontsdf = pd.DataFrame()
tsdf = DataFrame[TSFrame](index=pd.DatetimeIndex(pd.date_range("2022-01-01", "2022-01-02")))
test(nontsdf)
test(tsdf)
Run Code Online (Sandbox Code Playgroud)
用法:
[...]$ mypy demo.py
demo1.py:14: error: Argument 1 to "test" has incompatible type "pandas.core.frame.DataFrame"; expected "pandera.typing.pandas.DataFrame[TSFrame]" [arg-type]
Found 1 error in 1 file (checked 1 source file)
[...]$ python demo.py
...
pandera.errors.SchemaError: error in check_types decorator of function 'test': expected series 'None' to have type datetime64[ns], got int64
Run Code Online (Sandbox Code Playgroud)
更多信息: