python获取月份最大值xarray

Tom*_*ees 5 python numpy python-3.x pandas python-xarray

如何获得最大径流月份

我想获得每年以及整个时间序列最大径流量的月份。这个想法是通过查看最大径流月份来描述全球季节性特征。然后,我想尝试考虑每个像素是否具有单峰或双峰状态。

我想创建一个地图,就像这里的Pangeo示例中的一样。

范例图片

这表示最大降水的小时数。我想显示最大径流量的MONTH(以整数表示)。

获取数据

在这里,我下载了GRUN径流数据并创建了一个xarray对象。 注意:此处的数据集大于1GB。我正在使用它使此示例完全可重复。

# get the data
import subprocess
command = """
wget -O grun.nc https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/324386/GRUN_v1_GSWP3_WGS84_05_1902_2014.nc?sequence=1&isAllowed=y
"""
import os
if not os.path.exists('grun.nc'):
  process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
  output, error = process.communicate()

# read the data
import xarray as xr
ds = xr.open_dataset('grun.nc')

# select a subset so we can work with it more quickly
ds = ds.isel(time=slice(-100,-1))
ds

Out[]:
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, time: 99)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
  * time     (time) datetime64[ns] 2006-09-01 2006-10-01 ... 2014-11-01
Data variables:
    Runoff   (time, lat, lon) float32 ...
Attributes:
    title:                   GRUN
    version:                 GRUN 1.0
    meteorological_forcing:  GSWP3
    temporal_resolution:     monthly
    spatial_resolution:      0.5x0.5
    crs:                     WGS84
    proj4:                   +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
    EPSG:                    4326
    references:              Ghiggi et al.,2019. GRUN: An observation-based g...
    authors:                 Gionata Ghiggi; Lukas Gudmundsson
    contacts:                gionata.ghiggi@gmail.com; lukas.gudmundsson@env....
    institution:             Land-Climate Dynamics, Institute for Atmospheric...
    institution_id:          IAC ETHZ

Run Code Online (Sandbox Code Playgroud)

我尝试过的

我有nan值,所以我不能只将an argmax()应用于数据集。上面的Pangeo示例中,我使用了与@jhamman相同的方法。我不确定这给了我什么,但似乎给了我

# Apply argmax where you have NAN values
def my_func(ds, dim=None):
    return ds.isel(**{dim: ds['Runoff'].argmax(dim)})

mask = ds['Runoff'].isel(time=0).notnull()  # determine where you have valid data
ds2 = ds.fillna(-9999)  # fill nans with a missing flag of some kind
new = ds2.reset_coords(drop=True).groupby('time.month').apply(my_func, dim='time').where(mask)  # do the groupby operation/reduction and reapply the mask
new

Out[]:
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, month: 12)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    Runoff   (month, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
    title:                   GRUN
    version:                 GRUN 1.0
    meteorological_forcing:  GSWP3
    temporal_resolution:     monthly
    spatial_resolution:      0.5x0.5
    crs:                     WGS84
    proj4:                   +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
    EPSG:                    4326
    references:              Ghiggi et al.,2019. GRUN: An observation-based g...
    authors:                 Gionata Ghiggi; Lukas Gudmundsson
    contacts:                gionata.ghiggi@gmail.com; lukas.gudmundsson@env....
    institution:             Land-Climate Dynamics, Institute for Atmospheric...
    institution_id:          IAC ETHZ

Run Code Online (Sandbox Code Playgroud)

这给我

import matplotlib.pyplot as plt
fig,ax = plt.subplots(figsize=(12,8))
new.Runoff.sel(month=10).plot(ax=ax,  cmap='twilight')
Run Code Online (Sandbox Code Playgroud)

目前我的输出

 理想输出

我想要的是每个像素的值是最大径流的月份。

pandas如有必要,很乐意转换为。

因此,我最终得到一个xr.Dataset,其中包含最大径流月份的整数。理想情况下,最好也要有一个最大径流月份,以便我也可以看到这种季节性变化的方式。

<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
Data variables:
    Month_of_max (lat, lon) int32 ...

# OR EVEN BETTER
<xarray.Dataset>
Dimensions:  (lat: 360, lon: 720, Year: 10)
Coordinates:
  * lon      (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
  * lat      (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
  * year     (year) float64 2010 2011 2012 2013 ... 
Data variables:
    Month_of_max (lat, lon, year) int32 ...
Run Code Online (Sandbox Code Playgroud)

Tom*_*ees 1

So the best solution I found was to convert to a pandas.Dataframe object and then do the calculations there. I have wrapped the solution into the functions below.

First let's work with a subset of the data (it takes ages otherwise). This is a box around Kenya.

import xarray as xr
ds = xr.open_dataset('grun.nc')
ds = ds.isel(time=slice(-20,-1))
ds = ds.sel(lat=slice(-5.202,6.002),lon=slice(33.501,42.283))

ds.attrs = ''
ds


Out[]:
<xarray.Dataset>
Dimensions:  (lat: 22, lon: 18, time: 19)
Coordinates:
  * lon      (lon) float64 33.75 34.25 34.75 35.25 ... 40.75 41.25 41.75 42.25
  * lat      (lat) float64 -4.75 -4.25 -3.75 -3.25 -2.75 ... 4.25 4.75 5.25 5.75
  * time     (time) datetime64[ns] 2013-05-01 2013-06-01 ... 2014-11-01
Data variables:
    Runoff   (time, lat, lon) float32 ...
Run Code Online (Sandbox Code Playgroud)

The work is all done and tied together in: calculate_annual_month_of_max(). Basically what it does is convert the xr.Dataset to a pd.Dataframe object then it extracts the timestep of maximum Runoff for each lat-lon-year. The beauty of this approach is that it returns both the Runoff value and the month integer.

import pandas as pd

def convert_to_df(ds):
    """
    Returns:
    -------
    xr.Dataset
    """
    df = ds.to_dataframe()
    df.reset_index(inplace=True)
    return df


def calculate_year_month_cols(df):
    """"""
    assert 'time' in df.columns, f"time should be in df.columns. Currently: {[c for c in df.columns]}"
    df['year'] = df.time.map(lambda x: x.year)
    df['month'] = df.time.map(lambda x: x.month)

    return df


def calculate_month_of_max_value(df, value_col):
    """
    Arguments
    ---------
    df : pd.DataFrame
        dataframe converted from xarray with ['lat','lon', 'year', value_col] columns

    value_col : str
        column that you want to find the month of maximum for 
        e.g. Which month (int) in each pixel (lat,lon) has the highest runoff
    """
    max_months = df.loc[df.groupby(["lat","lon","year"])[value_col].idxmax()]
    return max_months


def convert_dataframe_to_xarray(df, index_cols=['lat','lon']):
    """
    Arguments
    ---------
    df: pd.DataFrame
        the dataframe to convert to xr.dataset

    index_cols: List[str]
        the columns that will become the coordinates 
        of the output xr.Dataset

    Returns
    -------
    xr.Dataset
    """
    out = df.set_index(index_cols).dropna()
    ds = out.to_xarray()
    return ds


def calculate_annual_month_of_max(ds, variable):
    """for the `variable` in the `ds` calculate the 
    month of maximum for a given pixel-year.

    Returns:
    -------
    xr.Dataset
    """
    # convert to a dataframe
    df = convert_to_df(ds)
    df = calculate_year_month_cols(df)
    # calculate the month of maximum
    df = calculate_month_of_max_value(df, value_col=variable)
    # reconstitute the dataframe object
    ds_out = convert_dataframe_to_xarray(df, index_cols=['lat','lon','year'])

    return ds_out


mon_of_max = calculate_annual_month_of_max(ds, variable='Runoff')
mon_of_max


Out[]:
<xarray.Dataset>
Dimensions:  (lat: 22, lon: 18, year: 2)
Coordinates:
  * lat      (lat) float64 -4.75 -4.25 -3.75 -3.25 -2.75 ... 4.25 4.75 5.25 5.75
  * lon      (lon) float64 33.75 34.25 34.75 35.25 ... 40.75 41.25 41.75 42.25
  * year     (year) float64 2.013e+03 2.014e+03
Data variables:
    time     (lat, lon, year) datetime64[ns] 2013-12-01 ... 2014-10-01
    Runoff   (lat, lon, year) float32 0.5894838 0.9081207 ... 0.2789653
    month    (lat, lon, year) float64 12.0 1.0 12.0 1.0 ... 11.0 10.0 11.0 10.0

Run Code Online (Sandbox Code Playgroud)

Which looks like: 最大径流月中位数(2013-2014)