Mor*_*ory 4 python netcdf python-xarray
我有一个来自 ERA5 的 2m 温度 netcdf 文件,该文件从 2000 年到 2019 年的 04 到 10 个月,总共提供 13680 个时间步长和 61x161 经纬度维度。我想分别计算每年所有每日时间步长的月平均值。例如,我们将获得 2000 年 4 月、2000 年 5 月等数据的月平均值。我已尝试使用 xarray resample 使用以下代码,但出现两个问题。
\n这是我\xe2\x80\x99m 谈论的内容:
\nimport xarray as xr\nds = xr.open_dataset(netcdf)\nmonthly_data=ds.resample(time='1M').mean()\nRun Code Online (Sandbox Code Playgroud)\n我们可以查看显示每月时间步长的时间戳,包括不相关的月份。
\nprint(np.array(monthly_data.time))\narray(['2000-04-30T00:00:00.000000000', '2000-05-31T00:00:00.000000000',\n '2000-06-30T00:00:00.000000000', '2000-07-31T00:00:00.000000000',\n '2000-08-31T00:00:00.000000000', '2000-09-30T00:00:00.000000000',\n '2000-10-31T00:00:00.000000000', '2000-11-30T00:00:00.000000000',\n '2000-12-31T00:00:00.000000000', '2001-01-31T00:00:00.000000000',\nRun Code Online (Sandbox Code Playgroud)\n为了验证温度的内容,我将数据转换为数据帧。
\ntemp_ar = np.array(monthly_data.t2m) \nprint(pd.DataFrame(temp_ar[0,:,:]).head())\n 0 1 2 ... 158 159 160\n0 270.940613 270.911652 270.926727 ... NaN NaN NaN\n1 271.294952 271.256744 271.250946 ... 272.948608 272.974731 272.998535\n2 271.416779 271.457214 271.483459 ... 273.123169 273.079285 273.058563\n3 271.848755 271.791382 271.784058 ... NaN 273.264038 NaN\n4 272.226837 272.144928 272.123016 ... NaN NaN NaN\n\nprint(pd.DataFrame(temp_ar[1,:,:]).head())\n 0 1 2 3 4 5 6 ... 154 155 156 157 158 159 160\n0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN\n1 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN\n2 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN\n3 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN\n4 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN\nRun Code Online (Sandbox Code Playgroud)\n第二个数组(对应于 2000 年 05 月)不应该有 nan,但它确实有,并且对于所有其他时间步都是这样(由于某种原因除了最后一个)。有人知道为什么会发生这种情况吗?
\n这是原始数据集
\nprint(ds)\n<xarray.Dataset>\nDimensions: (latitude: 61, longitude: 161, time: 13680)\nCoordinates:\n * longitude (longitude) float32 -80.0 -79.9 -79.8 -79.7 ... -64.2 -64.1 -64.0\n * latitude (latitude) float32 50.0 49.9 49.8 49.7 ... 44.3 44.2 44.1 44.0\n * time (time) datetime64[ns] 2000-04-01 ... 2018-10-30T23:00:00\nData variables:\n t2m (time, latitude, longitude) float32 ...\nAttributes:\n Conventions: CF-1.6\n history: 2020-12-07 03:50:31 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...\nRun Code Online (Sandbox Code Playgroud)\n任何帮助都会。也许我应该尝试其他方法?\n干杯!
\n我认为任何简单的方法都是使用该groupby方法
例子:
da = xr.DataArray(
np.linspace(0, 1673, num=1674),
coords=[pd.date_range("1/1/2000", "31/07/2004", freq="D")],
dims="time",
)
da
Run Code Online (Sandbox Code Playgroud)
输出:
<xarray.DataArray (time: 1674)>
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.671e+03, 1.672e+03, 1.673e+03])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2004-07-31
Run Code Online (Sandbox Code Playgroud)
对于年均值,您可以执行以下操作:
da.groupby('time.year').mean()
Run Code Online (Sandbox Code Playgroud)
输出:
<xarray.DataArray (year: 5)>
array([ 182.5, 548. , 913. , 1278. , 1567. ])
Coordinates:
* year (year) int64 2000 2001 2002 2003 2004
Run Code Online (Sandbox Code Playgroud)
对于不同年份的每月平均值,您可以创建一个多重索引:
year_month_idx = pd.MultiIndex.from_arrays([da['time.year'], da['time.month']])
da.coords['year_month'] = ('time', year_month_idx)
da.groupby('year_month').mean()
Run Code Online (Sandbox Code Playgroud)
输出:
<xarray.DataArray (year_month: 55)>
array([ 15. , 45. , 75. , 105.5, 136. , 166.5, 197. , 228. , 258.5,
289. , 319.5, 350. , 381. , 410.5, 440. , 470.5, 501. , 531.5,
562. , 593. , 623.5, 654. , 684.5, 715. , 746. , 775.5, 805. ,
835.5, 866. , 896.5, 927. , 958. , 988.5, 1019. , 1049.5, 1080. ,
1111. , 1140.5, 1170. , 1200.5, 1231. , 1261.5, 1292. , 1323. , 1353.5,
1384. , 1414.5, 1445. , 1476. , 1506. , 1536. , 1566.5, 1597. , 1627.5,
1658. ])
Coordinates:
* year_month (year_month) MultiIndex
* year_month_level_0 (year_month) int64 2000 2000 2000 ... 2002 2002 2002
* year_month_level_1 (year_month) int64 1 2 3 4 5 6 7 8 ... 11 12 1 2 3 4 5 6
Run Code Online (Sandbox Code Playgroud)