ValueError:尝试从字典创建 pandas DataFrame 时,每列数组必须都是一维的。为什么?

Gov*_*rai 13 python numpy dataframe pandas

我正在尝试从字典创建一个非常简单的 Pandas DataFrame。字典有 3 个项目,DataFrame 也是如此。他们是:

\n
    \n
  • 具有“形状”(3,) 的列表
  • \n
  • 形状为(3, 3) 的 list/np.array (在不同的尝试中)
  • \n
  • 常量 100(整列的值相同)
  • \n
\n
    \n
  1. 这是成功并显示首选 df 的代码
  2. \n
\n

\xe2\x80\x8b

\n
# from a dicitionary\n>>>dict1 = {"x": [1, 2, 3],\n...         "y": list(\n...             [\n...                 [2, 4, 6], \n...                 [3, 6, 9], \n...                 [4, 8, 12]\n...             ]\n...             ),\n...         "z": 100}\n\n>>>df1 = pd.DataFrame(dict1)\n>>>df1\n   x           y    z\n0  1   [2, 4, 6]  100\n1  2   [3, 6, 9]  100\n2  3  [4, 8, 12]  100\n
Run Code Online (Sandbox Code Playgroud)\n
    \n
  1. 但随后我将一个 Numpy ndarray (形状 3, 3 )分配给 key y,并尝试从字典创建一个 DataFrame 。我尝试创建 DataFrame 的行出错了。下面是我尝试运行的代码以及我得到的错误(在单独的代码块中以便于阅读。)
  2. \n
\n
    \n
  • 代码
  • \n
\n

\xe2\x80\x8b

\n
>>>dict2 = {"x": [1, 2, 3],\n...         "y": np.array(\n...             [\n...                 [2, 4, 6], \n...                 [3, 6, 9], \n...                 [4, 8, 12]\n...             ]\n...             ),\n...         "z": 100}\n\n>>>df2 = pd.DataFrame(dict2)  # see the below block for error\n
Run Code Online (Sandbox Code Playgroud)\n
    \n
  • 错误
  • \n
\n

\xe2\x80\x8b

\n
---------------------------------------------------------------------------\nValueError                                Traceback (most recent call last)\nd:\\studies\\compsci\\pyscripts\\study\\pandas-realpython\\data-delightful\\01.intro.ipynb Cell 10\' in <module>\n      1 # from a dicitionary\n      2 dict1 = {"x": [1, 2, 3],\n      3          "y": np.array(\n      4              [\n   (...)\n      9              ),\n     10          "z": 100}\n---> 12 df1 = pd.DataFrame(dict1)\n\nFile ~\\anaconda3\\envs\\dst\\lib\\site-packages\\pandas\\core\\frame.py:636, in DataFrame.__init__(self, data, index, columns, dtype, copy)\n    630     mgr = self._init_mgr(\n    631         data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy\n    632     )\n    634 elif isinstance(data, dict):\n    635     # GH#38939 de facto copy defaults to False only in non-dict cases\n--> 636     mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)\n    637 elif isinstance(data, ma.MaskedArray):\n    638     import numpy.ma.mrecords as mrecords\n\nFile ~\\anaconda3\\envs\\dst\\lib\\site-packages\\pandas\\core\\internals\\construction.py:502, in dict_to_mgr(data, index, columns, dtype, typ, copy)\n    494     arrays = [\n    495         x\n    496         if not hasattr(x, "dtype") or not isinstance(x.dtype, ExtensionDtype)\n    497         else x.copy()\n    498         for x in arrays\n    499     ]\n    500     # TODO: can we get rid of the dt64tz special case above?\n--> 502 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)\n\nFile ~\\anaconda3\\envs\\dst\\lib\\site-packages\\pandas\\core\\internals\\construction.py:120, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)\n    117 if verify_integrity:\n    118     # figure out the index, if necessary\n    119     if index is None:\n--> 120         index = _extract_index(arrays)\n    121     else:\n    122         index = ensure_index(index)\n\nFile ~\\anaconda3\\envs\\dst\\lib\\site-packages\\pandas\\core\\internals\\construction.py:661, in _extract_index(data)\n    659         raw_lengths.append(len(val))\n    660     elif isinstance(val, np.ndarray) and val.ndim > 1:\n--> 661         raise ValueError("Per-column arrays must each be 1-dimensional")\n    663 if not indexes and not raw_lengths:\n    664     raise ValueError("If using all scalar values, you must pass an index")\n\nValueError: Per-column arrays must each be 1-dimensional\n
Run Code Online (Sandbox Code Playgroud)\n

尽管两个数组的维度相同,为什么它会像第二次尝试那样以错误结束?此问题的解决方法是什么?

\n

Ham*_*zah 14

如果您仔细查看错误消息并快速查看此处的源代码:

    elif isinstance(val, np.ndarray) and val.ndim > 1:
        raise ValueError("Per-column arrays must each be 1-dimensional")
Run Code Online (Sandbox Code Playgroud)

您会发现,如果字典值是一个 numpy 数组并且具有多个维度(如您的示例所示),它会根据源代码抛出错误。因此,它与列表配合得很好,因为即使列表是列表的列表,列表也不会超过一维。

lst = [[1,2,3],[4,5,6],[7,8,9]]
len(lst) # print 3 elements or (3,) not (3,3) like numpy array.
Run Code Online (Sandbox Code Playgroud)

您可以尝试使用 np.array([1,2,3]),它会起作用,因为维度数为 1 并尝试:

arr = np.array([1,2,3])
print(arr.ndim)  # output is 1
Run Code Online (Sandbox Code Playgroud)

如果需要在字典中使用 numpy 数组,可以使用.tolist()将 numpy 数组转换为列表。