I have a DataFrame with the following structure:
df = df.set_index('timestamp')
print(df.head())
timestamp id value
2018-12-31 23:00:00 5c8fea84763aae175afda38b 98.587768
2018-12-31 23:10:00 5c8fea84763aae175afda38b 107.232742
2018-12-31 23:20:00 5c8fea84763aae175afda38b 104.224153
2018-12-31 23:30:00 5c8fea84763aae175afda38b 104.090750
2018-12-31 23:40:00 5c8fea84763aae175afda38b 99.357023
Run Code Online (Sandbox Code Playgroud)
I need to obtain a new DataFrame with daily max and min values, as well as the mean. I have no problem in obtaining this data and I do it this way:
df = df.resample('D').agg(['min', 'max', 'mean'], columns=['value'])
Run Code Online (Sandbox Code Playgroud)
The problem is that I loose the column id and I need it in order to store new data in a database.
This is the output I get by printing the head of the new DataFrame:
timestamp min max mean
2018-12-31 98.587768 107.641060 103.522250
2019-01-01 88.396180 109.506622 100.135128
2019-01-02 85.857570 112.420754 99.839120
2019-01-03 87.565014 113.419561 99.734654
2019-01-04 88.902704 112.186989 99.764259
Run Code Online (Sandbox Code Playgroud)
As you can see, I have lost id field.
Pass a dictionary to agg to aggregate multiple columns. For "ID", aggregate by taking the first value.
Here's an example:
df.resample('D').agg({'id': 'first', 'value': ['mean', 'max']})
id value
first mean max
timestamp
2018-12-31 5c8fea84763aae175afda38b 102.698487 107.232742
Run Code Online (Sandbox Code Playgroud)
If you so wish, you can rename the output columns by passing tuples:
df.resample('D').agg({
'id': [('A', 'first')], 'value': [('B', 'mean'), ('C', 'max')]})
id value
A B C
timestamp
2018-12-31 5c8fea84763aae175afda38b 102.698487 107.232742
Run Code Online (Sandbox Code Playgroud)
Also see Multiple aggregations of the same column using pandas GroupBy.agg().