Losing String column when using resample and aggregation with pandas

Question

Losing String column when using resample and aggregation with pandas

I have a DataFrame with the following structure:

df = df.set_index('timestamp')
print(df.head())

timestamp            id                         value
2018-12-31 23:00:00  5c8fea84763aae175afda38b   98.587768
2018-12-31 23:10:00  5c8fea84763aae175afda38b  107.232742
2018-12-31 23:20:00  5c8fea84763aae175afda38b  104.224153
2018-12-31 23:30:00  5c8fea84763aae175afda38b  104.090750
2018-12-31 23:40:00  5c8fea84763aae175afda38b   99.357023

Run Code Online (Sandbox Code Playgroud)

I need to obtain a new DataFrame with daily max and min values, as well as the mean. I have no problem in obtaining this data and I do it this way:

df = df.resample('D').agg(['min', 'max', 'mean'], columns=['value'])

Run Code Online (Sandbox Code Playgroud)

The problem is that I loose the column id and I need it in order to store new data in a database.

This is the output I get by printing the head of the new DataFrame:

timestamp   min        max         mean
2018-12-31  98.587768  107.641060  103.522250
2019-01-01  88.396180  109.506622  100.135128
2019-01-02  85.857570  112.420754   99.839120
2019-01-03  87.565014  113.419561   99.734654
2019-01-04  88.902704  112.186989   99.764259

Run Code Online (Sandbox Code Playgroud)

As you can see, I have lost id field.

Answer 1

cs9*_*s95 5

Pass a dictionary to agg to aggregate multiple columns. For "ID", aggregate by taking the first value.

Here's an example:

df.resample('D').agg({'id': 'first', 'value': ['mean', 'max']})

                                  id       value            
                               first        mean         max
timestamp                                                   
2018-12-31  5c8fea84763aae175afda38b  102.698487  107.232742

Run Code Online (Sandbox Code Playgroud)

If you so wish, you can rename the output columns by passing tuples:

df.resample('D').agg({
    'id': [('A', 'first')], 'value': [('B', 'mean'), ('C', 'max')]})

                                  id       value            
                                   A           B           C
timestamp                                                   
2018-12-31  5c8fea84763aae175afda38b  102.698487  107.232742

Run Code Online (Sandbox Code Playgroud)

Also see Multiple aggregations of the same column using pandas GroupBy.agg().

归档时间：	6 年，7 月前
查看次数：	62 次
最近记录：	6 年，7 月前