Pandas DataFrame.assign参数

Question

Pandas DataFrame.assign参数

题

如何assign使用添加多个新列的原始DataFrame的副本？

期望的结果

df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

Run Code Online (Sandbox Code Playgroud)

ATTEMPTS

上面的例子导致:

ValueError: Wrong number of items passed 2, placement implies 1.

背景

assignPandas中的函数获取连接到新分配列的相关数据帧的副本,例如

df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

Run Code Online (Sandbox Code Playgroud)

此函数的0.19.2文档意味着可以向数据框添加多个列.

可以在同一分配中分配多个列,但不能引用在同一分配调用中创建的其他列.

此外:

参数:
kwargs:关键字,值对

关键字是列名.

该函数的源代码声明它接受字典:

def assign(self, **kwargs):
    """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed 
        on the DataFrame and assigned to the new columns. If the values are not callable, 
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
    """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data

Run Code Online (Sandbox Code Playgroud)

Answer 1

roo*_*oot 24

您可以通过提供每个新列作为关键字参数来创建多个列:

df = df.assign(C=df['A']**2, D=df.B*2)

Run Code Online (Sandbox Code Playgroud)

我通过使用**以下方法将字典解压缩为关键字参数来使您的示例字典工作:

df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

Run Code Online (Sandbox Code Playgroud)

看起来assign应该能够使用字典,但根据您发布的源代码,它看起来并不支持.

结果输出:

   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	17035 次
最近记录：	6 年，9 月前