将 DataArray 转换为 DataFrame 并保留坐标标签顺序

Jop*_*ppy 6 pandas python-xarray

有没有一种简单的方法可以将 xarray DataArray 转换为 pandas DataFrame,我可以在其中指定将哪些维度转换为索引/列?例如,假设我有一个 DataArray

import xarray as xr
weather = xr.DataArray(
    name='weather',
    data=[['Sunny', 'Windy'], ['Rainy', 'Foggy']],
    dims=['date', 'time'],
    coords={
        'date': ['Thursday', 'Friday'],
        'time': ['Morning', 'Afternoon'],
    }
)
Run Code Online (Sandbox Code Playgroud)

结果是:

<xarray.DataArray 'weather' (date: 2, time: 2)>
array([['Sunny', 'Windy'],
       ['Rainy', 'Foggy']], dtype='<U5')
Coordinates:
  * date     (date) <U8 'Thursday' 'Friday'
  * time     (time) <U9 'Morning' 'Afternoon'
Run Code Online (Sandbox Code Playgroud)

假设我现在想将其移动到按日期索引的 pandas DataFrame,其中包含时间列。我可以通过使用.to_dataframe()然后.unstack()在生成的数据帧上来做到这一点:

<xarray.DataArray 'weather' (date: 2, time: 2)>
array([['Sunny', 'Windy'],
       ['Rainy', 'Foggy']], dtype='<U5')
Coordinates:
  * date     (date) <U8 'Thursday' 'Friday'
  * time     (time) <U9 'Morning' 'Afternoon'
Run Code Online (Sandbox Code Playgroud)

然而,pandas 会对事情进行排序,所以我得到的不是“上午”然后是“下午”,而是“下午”然后是“上午”。我更希望有一个像这样的API

>>> weather.to_dataframe().unstack()
           weather        
time     Afternoon Morning
date                      
Friday       Foggy   Rainy
Thursday     Windy   Sunny
Run Code Online (Sandbox Code Playgroud)

它可以为我进行这种重塑,而无需我事后重新排序我的索引和列。

Max*_*ian 1

在 xarray 0.16.1 中,dim_order被添加到.to_dataframe. 这符合您的要求吗?

xr.DataArray.to_dataframe(
    self,
    name: Hashable = None,
    dim_order: List[Hashable] = None,
) -> pandas.core.frame.DataFrame
Docstring:
Convert this array and its coordinates into a tidy pandas.DataFrame.

The DataFrame is indexed by the Cartesian product of index coordinates
(in the form of a :py:class:`pandas.MultiIndex`).

Other coordinates are included as columns in the DataFrame.

Parameters
----------
name
    Name to give to this array (required if unnamed).
dim_order
    Hierarchical dimension order for the resulting dataframe.
    Array content is transposed to this order and then written out as flat
    vectors in contiguous order, so the last dimension in this list
    will be contiguous in the resulting DataFrame. This has a major
    influence on which operations are efficient on the resulting
    dataframe.

    If provided, must include all dimensions of this DataArray. By default,
    dimensions are sorted according to the DataArray dimensions order.
Run Code Online (Sandbox Code Playgroud)