How to calculate total precipitation per day using hourly data for whole year?

Ep1*_*1aN 3 analysis dataset python-3.x cdo-climate era5

I have hourly data from ERA5 for each day in a specific year. I want to convert that data from hourly to daily. I know the long and hard way to do it, but I need something which does that easily.

Copernicus has a code for this here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation, which works fine if the data set is only converted for one day, but when converting for the whole year, i am having problems with that.

Link to download ERA5 dataset which is available at https://cds.climate.copernicus.eu/cdsapp#!/home

Follow the steps to use copernicus server here

https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5

This script downloads the houly data for only 2 days (1st and 2nd of January 2017):
#!/usr/bin/env python
"""
Save as get-tp.py, then run "python get-tp.py".
  
Input file : None
Output file: tp_20170101-20170102.nc
"""
import cdsapi
 
c = cdsapi.Client()
r = c.retrieve(
    'reanalysis-era5-single-levels', {
            'variable'    : 'total_precipitation',
            'product_type': 'reanalysis',
            'year'        : '2017',
            'month'       : '01',
            'day'         : ['01', '02'],
            'time'        : [
                '00:00','01:00','02:00',
                '03:00','04:00','05:00',
                '06:00','07:00','08:00',
                '09:00','10:00','11:00',
                '12:00','13:00','14:00',
                '15:00','16:00','17:00',
                '18:00','19:00','20:00',
                '21:00','22:00','23:00'
            ],
            'format'      : 'netcdf'
    })
r.download('tp_20170101-20170102.nc')
## Add multiple days and multiple months to donload more data
Run Code Online (Sandbox Code Playgroud) Below script will create a netCDF file for only one day
#!/usr/bin/env python
"""
Save as file calculate-daily-tp.py and run "python calculate-daily-tp.py".
  
Input file : tp_20170101-20170102.nc
Output file: daily-tp_20170101.nc
"""
import time, sys
from datetime import datetime, timedelta
 
from netCDF4 import Dataset, date2num, num2date
import numpy as np
 
day = 20170101
d = datetime.strptime(str(day), '%Y%m%d')
f_in = 'tp_%d-%s.nc' % (day, (d + timedelta(days = 1)).strftime('%Y%m%d'))
f_out = 'daily-tp_%d.nc' % day
 
time_needed = []
for i in range(1, 25):
    time_needed.append(d + timedelta(hours = i))
 
with Dataset(f_in) as ds_src:
    var_time = ds_src.variables['time']
    time_avail = num2date(var_time[:], var_time.units,
            calendar = var_time.calendar)
 
    indices = []
    for tm in time_needed:
        a = np.where(time_avail == tm)[0]
        if len(a) == 0:
            sys.stderr.write('Error: precipitation data is missing/incomplete - %s!\n'
                    % tm.strftime('%Y%m%d %H:%M:%S'))
            sys.exit(200)
        else:
            print('Found %s' % tm.strftime('%Y%m%d %H:%M:%S'))
            indices.append(a[0])
 
    var_tp = ds_src.variables['tp']
    tp_values_set = False
    for idx in indices:
        if not tp_values_set:
            data = var_tp[idx, :, :]
            tp_values_set = True
        else:
            data += var_tp[idx, :, :]
         
    with Dataset(f_out, mode = 'w', format = 'NETCDF3_64BIT_OFFSET') as ds_dest:
        # Dimensions
        for name in ['latitude', 'longitude']:
            dim_src = ds_src.dimensions[name]
            ds_dest.createDimension(name, dim_src.size)
            var_src = ds_src.variables[name]
            var_dest = ds_dest.createVariable(name, var_src.datatype, (name,))
            var_dest[:] = var_src[:]
            var_dest.setncattr('units', var_src.units)
            var_dest.setncattr('long_name', var_src.long_name)
 
        ds_dest.createDimension('time', None)
        var = ds_dest.createVariable('time', np.int32, ('time',))
        time_units = 'hours since 1900-01-01 00:00:00'
        time_cal = 'gregorian'
        var[:] = date2num([d], units = time_units, calendar = time_cal)
        var.setncattr('units', time_units)
        var.setncattr('long_name', 'time')
        var.setncattr('calendar', time_cal)
 
        # Variables
        var = ds_dest.createVariable(var_tp.name, np.double, var_tp.dimensions)
        var[0, :, :] = data
        var.setncattr('units', var_tp.units)
        var.setncattr('long_name', var_tp.long_name)
 
        # Attributes
        ds_dest.setncattr('Conventions', 'CF-1.6')
        ds_dest.setncattr('history', '%s %s'
                % (datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                ' '.join(time.tzname)))
 
        print('Done! Daily total precipitation saved in %s' % f_out)
Run Code Online (Sandbox Code Playgroud)

What I want is a code which will follows the same step as above data but assuming that I have an input file with one year houly data and convert that to one year daily data.

The result should be daily values for the calculate variable (such as precipitation, etc) for the whole year.

Example: Let's say I have a precipitation data for the whole year as 1mm/hr every day, I would have 2928 values for the whole year.

What I want is 24mm/day for the whole year with only 365 values for a non-leap year.

示例输入数据集:数据的子集可以从这里下载(2017 年 1 月 1 日和 2 日)https://www.dropbox.com/sh/0vdfn20p355st3i/AABKYO4do_raGHC34VnsXGPqa?dl=0。只需在此之后使用第二个脚本来检查代码。{全年代码>10GB无法上传

提前致谢

N1B*_*1B4 6

xarray resample只是适合您的工具。它在一行中将 netCDF 数据从一种时间分辨率(例如每小时)转换为另一种(例如每天)。使用您的示例数据文件,我们可以使用以下代码创建每日均值:

import xarray as xr

ds = xr.open_dataset('./tp_20170101-20170102.nc')
tp = ds['tp'] # dimensions [time: 48, latitude: 721, longitude: 1440]
tp_daily = tp.resample(time='D').mean(dim='time') # dimensions (time: 2, latitude: 721, longitude: 1440)
Run Code Online (Sandbox Code Playgroud)

您将看到该resample命令接受一个时间代码,在本例'D'中表示每天,然后我们指定我们要使用当天的每小时数据计算每天的平均值.mean(dim='time')

例如,如果您想计算每日最大值而不是每日平均值,则可以将其替换.mean(dim='time').max(dim='time')。您还可以从每小时到每月(MS或每月开始)、每年(AS或每年开始)等等。时间频率代码可以在Pandas 文档中找到。