类型注释熊猫数据框

Kei*_*art 7 python pandas

Pythonistas给您的问题-如果函数或方法返回Pandas DataFrame,如何记录列名和列类型。有没有办法在Python的内置类型注释中做到这一点,还是只使用docstrings?而且,如果您仅使用文档字符串,那么如何格式化它们以使其尽可能简洁?我尝试的一切都没有什么Python风格。谢谢!

Xuk*_*rao 8

文档字符串格式

我使用numpy 文档字符串约定作为基础。如果函数的输入参数或返回参数是具有预定列的Pandas 数据框,那么我会向参数描述添加一个带有列描述的 reStructuredText 样式。举个例子:

def random_dataframe(no_rows):
    """Return dataframe with random data.

    Parameters
    ----------
    no_rows : int
        Desired number of data rows.

    Returns
    -------
    pd.DataFrame
        Dataframe with with randomly selected values. Data columns are as follows:

        ==========  ==============================================================
        rand_int    randomly chosen whole numbers (as `int`)
        rand_float  randomly chosen numbers with decimal parts (as `float`)
        rand_color  randomly chosen colors (as `str`)
        rand_bird   randomly chosen birds (as `str`)
        ==========  ==============================================================

    """
    df = pd.DataFrame({
        "rand_int": np.random.randint(0, 100, no_rows),
        "rand_float": np.random.rand(no_rows),
        "rand_color": np.random.choice(['green', 'red', 'blue', 'yellow'], no_rows),
        "rand_bird": np.random.choice(['kiwi', 'duck', 'owl', 'parrot'], no_rows),
    })

    return df
Run Code Online (Sandbox Code Playgroud)

奖励:狮身人面像兼容性

前面提到的 docstring 格式与sphinx autodoc文档生成器兼容。这是 docstring 在 sphinx 自动生成的 HTML 文档中的样子(使用自然主题):

sphinx 文档字符串


Edw*_*ard 5

我尝试过@Xukrao的方法。有一个汇总表真是太好了。

同样受到stackoverflow中另一个问题的启发,使用csv-tableblock在修改方面更加方便。不必担心对齐和“=”。例如:

intra_edges (DataFrame): correspondence between intra-edges in
    planar graph and in multilayer graph.

    .. csv-table::
        :header: name, dtype, definition

        source_original (index), object, target in planar graph
        target_original (index), object, target in planar graph
        source, object, current source bus
        target, object, current target bus

inter_edges (DataFrame): correspondence between inter-nodes in
    planar graph and inter-edges in multilayer graph.

    ======  =======  ============================  ==========
    name    dtype    definition                    is_index
    ======  =======  ============================  ==========
    node    object   name in planar graph          True
    upper   int64    integer index of upper layer  False
    lower   int64    integer index of lower layer  False
    source  object   source node in supra graph    False
    target  object   target node in supra graph    False
    ======  =======  ============================  ==========
Run Code Online (Sandbox Code Playgroud)