使用值和列表向MultiIndex pandas DataFrame添加新行

Question

使用值和列表向MultiIndex pandas DataFrame添加新行

dan*_*dar 4 python multi-index dataframe python-3.x pandas

我有一个MultiIndex DataFrame：

                 predicted_y actual_y predicted_full actual_full
subj_id org_clip                                                
123     3                  2        5      [1, 2, 3]   [4, 5, 6]

Run Code Online (Sandbox Code Playgroud)

我希望向以下添加新行：

                 predicted_y actual_y predicted_full   actual_full
subj_id org_clip                                                  
123     3                  2        5      [1, 2, 3]     [4, 5, 6]
321     4                 20       50   [10, 20, 30]  [40, 50, 60]    # add this row

Run Code Online (Sandbox Code Playgroud)

而下面的代码可以做到这一点：

df.loc[('321', 4),['predicted_y', 'actual_y']] = [20, 50]
df.loc[('321', 4),['predicted_full', 'actual_full']] = [[10,20,30], [40,50,60]]

Run Code Online (Sandbox Code Playgroud)

但是，当尝试在一行中添加新行时，出现错误：

df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, [10,20,30], [40,50,60]]

>>> ValueError: setting an array element with a sequence.

Run Code Online (Sandbox Code Playgroud)

笔记：

我认为这与尝试添加包含值和列表的行有关（可能是语法上的）。其他所有尝试都引发了相同的错误；请参阅以下示例：

df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full', 'actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full'], ['actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', [['predicted_full'], ['actual_full']]]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, np.array([10,20,30]), np.array([40,50,60])]

Run Code Online (Sandbox Code Playgroud)

构造初始代码DataFrame：

df = pd.DataFrame(index=pd.MultiIndex(levels=[[], []], labels=[[], []], names=['subj_id', 'org_clip']),
                  columns=['predicted_y', 'actual_y', 'predicted_full', 'actual_full'])
df.loc[('123', 3),['predicted_y', 'actual_y']] = [2, 5]
df.loc[('123', 3),['predicted_full', 'actual_full']] = [[1,2,3], [4,5,6]]

Run Code Online (Sandbox Code Playgroud)

Answer 1

piR*_*red 6

您可以pd.Series处理dtypes

row_to_append = pd.Series([20, 50, [10, 20, 30], [40, 50, 60]])
cols = ['predicted_y', 'actual_y', 'predicted_full', 'actual_full']
df.loc[(321, 4), cols] = row_to_append.values

df

Run Code Online (Sandbox Code Playgroud)

Answer 2

unu*_*tbu 5

使至少一个子列表成为 dtype 数组object：

In [27]: df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] =  (
           [20, 50, np.array((10, 20, 30), dtype='O'), [40, 50, 60]])

In [28]: df
Out[28]: 
                 predicted_y actual_y predicted_full   actual_full
subj_id org_clip                                                  
123     3                  2        5      [1, 2, 3]     [4, 5, 6]
321     4                 20       50   [10, 20, 30]  [40, 50, 60]

Run Code Online (Sandbox Code Playgroud)

请注意，错误

ValueError: setting an array element with a sequence.

Run Code Online (Sandbox Code Playgroud)

发生在这一行：

--> 643         arr_value = np.array(value)

Run Code Online (Sandbox Code Playgroud)

并且可以像这样复制

In [12]: np.array([20, 50, [10, 20, 30], [40, 50, 60]])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-f6122275ab9f> in <module>()
----> 1 np.array([20, 50, [10, 20, 30], [40, 50, 60]])

ValueError: setting an array element with a sequence.

Run Code Online (Sandbox Code Playgroud)

但是如果其中一个子列表是一个 dtype 对象数组，那么结果就是一个 dtype 对象数组：

In [16]: np.array((20, 50, np.array((10, 20, 30), dtype='O'), (40, 50, 60)))
Out[16]: array([20, 50, array([10, 20, 30], dtype=object), (40, 50, 60)], dtype=object)

Run Code Online (Sandbox Code Playgroud)

因此可以避免 ValueError。

归档时间：	9 年，3 月前
查看次数：	3800 次
最近记录：	9 年，3 月前