xarray 从 Pandas 写入 netCDF - 维度问题

Cli*_*int 5 pandas python-xarray

学习如何使用 xarray 从 Pandas DF 生成 netCDF 文件。遵循几个教程和 SO 问题将“常量”维度添加到 xarray 数据集并将“常量”维度添加到 xarray 数据集,但仍然存在一些问题,因为我无法将 Date_Time、lat 和 lon 作为维度。当我进行 nc 转储时,它们不正确。

将txt文件导入pandas df然后xr到netCDF的初始方法:

import pandas as pd
import xarray

#IMport Data from .dat file
colnames1 = ['Date','Time','latitude','longitude','Status','depth']
df2 = pd.read_csv('test.txt',header=0,error_bad_lines=False, names = colnames1,delim_whitespace=True)

# create xray Dataset from Pandas DataFrame
xr = xarray.Dataset.from_dataframe(df2)

# add variable attribute metadata
xr['latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['depth'].attrs={'units':'m', 'long_name':'depth'}


# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data generated'}
#print xr
print xr
# save to netCDF
xr.to_netcdf('test.nc')
Run Code Online (Sandbox Code Playgroud)

其中 df2 =

Date            Time  grid_latitude  grid_longitude  Status  depth                                                                   
2017-09-05  13:01:59     -29.034083       31.068567     2.0    0.0   
2017-09-05  13:01:59     -29.039367       31.059150     2.0    0.0   
2017-09-05  13:01:59     -29.036650       31.059200     3.0    0.0   
2017-09-05  13:01:59     -29.036750       31.065417     7.0  100.0   
2017-09-05  13:01:59     -29.039317       31.056050     7.0  100.0   
2017-09-05  13:01:59     -29.034000       31.062367     3.0    0.0   
2017-09-05  13:01:59     -29.036517       31.049900     3.0    0.0   
2017-09-05  13:01:59     -29.031100       31.050000     3.0    0.0 
Run Code Online (Sandbox Code Playgroud)

这工作正常,但尺寸不正确(见下文):

<xarray.Dataset>
Dimensions:    (index: 8)
Coordinates:
  * index      (index) int64 0 1 2 3 4 5 6 7
Data variables:
    Date       (index) object '2017-09-05' '2017-09-05' '2017-09-05' ...
    Time       (index) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
    latitude   (index) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
    longitude  (index) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
    Status     (index) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
    depth      (index) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
    title: Data
    summary: Data generated
    Conventions: CF-1.6
Run Code Online (Sandbox Code Playgroud)

如果我将日期或合并的 Date_Time 设置为 DF 索引,则日期/时间的维度很好并且被视为一个维度:

<xarray.Dataset>
Dimensions:    (Date: 8)
Coordinates:
  * Date       (Date) object '2017-09-05' '2017-09-05' '2017-09-05' ...
Data variables:
    Time       (Date) object '13:01:59' '13:01:59' '13:01:59' '13:01:59' ...
    latitude   (Date) float64 -29.03 -29.04 -29.04 -29.04 -29.04 -29.03 ...
    longitude  (Date) float64 31.07 31.06 31.06 31.07 31.06 31.06 31.05 31.05
    Status     (Date) float64 2.0 2.0 3.0 7.0 7.0 3.0 3.0 3.0
    depth      (Date) float64 0.0 0.0 0.0 100.0 100.0 0.0 0.0 0.0
Attributes:
    title: Data
    summary: Data generated
    Conventions: CF-1.6
Run Code Online (Sandbox Code Playgroud)

但是,如果我在 Date_Time、Lat 和 Lon 上设置 df.index,它会恢复为空白(索引)。希望得到尺寸设置的指针。使用 netCDF 模块,可以使用语法:lat = dataset.createDimension('lat', 73) 创建维度。SO 示例将维度添加到 xarray DataArray也无济于事。也许我错过了一些东西,或者是我学习的局限性。我想让它达到 nc 转储产生与此类似的东西的地步。

NetCDF dimension information:
        Name: lat
                size: 73
                type: dtype('float32')
                units: u'degrees_north'
                actual_range: array([ 90., -90.], dtype=float32)
                long_name: u'Latitude'
                standard_name: u'latitude'
                axis: u'Y'
        Name: lon
                size: 144
                type: dtype('float32')
                units: u'degrees_east'
                long_name: u'Longitude'
                actual_range: array([   0. ,  357.5], dtype=float32)
                standard_name: u'longitude'
                axis: u'X'
        Name: time
                size: 366
                type: dtype('float64')
                units: u'hours since 1-1-1 00:00:0.0'
                long_name: u'Time'
                actual_range: array([ 17628096.,  17636856.])
                delta_t: u'0000-00-01 00:00:00'
                standard_name: u'time'
                axis: u'T'
                avg_period: u'0000-00-01 00:00:00'
Run Code Online (Sandbox Code Playgroud)

否则我可以将 DF 列转换为 np 数组,并使用 netCDF 模块?提前谢谢了。我确实冒险尝试过这样的事情,但我怀疑它是否走在正确的道路上:

#add dimeensions
#d = {}
#d['time'] = ('time',df2.Time)
#d['latitude'] = ('latitude',df2.latitude)
#d['longitude'] = ('longitude', df2.longitude)
#d['var'] = (['time','latitude','longitude','Depth'], xr)
#xr = xray.Dataset(d)
Run Code Online (Sandbox Code Playgroud)

sho*_*yer 8

这是最容易通过组合来实现Timegrid_latitudegrid_longitudepandas.MultiIndex在数据帧与set_index()转换成数据集xarray之前。

例如:

# note that pandas.DataFrame's to_xarray() method is equivalent to
# xarray.Dataset.from_dataframe()
ds = df.set_index(['Time', 'grid_latitude', 'grid_longitude']).to_xarray()
Run Code Online (Sandbox Code Playgroud)