jea*_*elj 4 python loops categories pandas
我正在尝试根据下面的现有数据框创建一个新的数据框。我的目标是计算点击次数的平均变化并相应地对活动进行分类。
现有数据框 df:
campaign | date | clicks
A 2015-10-11 255
A 2015-10-12 367
A 2015-10-13 489
B 2015-10-11 500
B 2015-10-15 122
C 2015-10-11 33
Run Code Online (Sandbox Code Playgroud)
目标数据框 df_categorized:
campaign | avg_change | category
A 0.3858 increasing
B -0.756 decreasing
C 0 no change
Run Code Online (Sandbox Code Playgroud)
我尝试了这段代码,但收到错误消息 TypeError: 'long' object does not support item assignment
#standard packages
import pandas as pd
import numpy as np
#upload data into df
df = pd.read_csv('C:\Users\xxx\Documents\\ad_table.csv')
df.head()
campaign | date | clicks
A 2015-10-11 255
A 2015-10-12 367
A 2015-10-13 489
B 2015-10-11 500
B 2015-10-15 122
C 2015-10-11 33
#create empty dataframe
columns = ['group','avg_change', 'category']
df_categorized = pd.DataFrame(columns=columns)
df_categorized['avg change'] = df.clicks.apply(lambda df: df.pct_change().abs().mean())
#create column
df_categorized['category'] = 0
# going up
df_categorized['category'][df_categorized['avg change'] > 0] = "increasing"
# going down
df_categorized['category'][df_categorized['avg change'] < 0] = "decreasing"
#no change
df_categorized['category'][df_categorized['avg change'] = 0] = "no change"
Run Code Online (Sandbox Code Playgroud)
您可以groupby
在“campaign”上,然后apply
计算lambda
并pct_change
返回mean
。然后您可以reset_index
使用以下命令添加其他类别列np.where
:
In [239]:
gp = df.groupby('campaign')['clicks'].apply(lambda x: x.pct_change().mean()).reset_index(name='avg_change').fillna(0)
gp['category'] = np.where(gp['avg_change'] < 0, 'decreasing', np.where(gp['avg_change'] > 0, 'increasing', 'no change'))
gp
Out[239]:
campaign avg_change category
0 A 0.38582 increasing
1 B -0.75600 decreasing
2 C 0.00000 no change
Run Code Online (Sandbox Code Playgroud)
这:
df_categorized['avg change'] = df.clicks.apply(lambda df: df.pct_change().abs().mean())
Run Code Online (Sandbox Code Playgroud)
不起作用,您正在调用apply
一列,因此 lambda 将是每个行元素,在本例中是一个,int
因此您会收到错误:
AttributeError: 'int' object has no attribute 'pct_change'
Run Code Online (Sandbox Code Playgroud)
即使没有这个,它也不会为您提供每个活动的 pct_change 。
也不要像这样对 df 进行链式调用:
df_categorized['category'][df_categorized['avg change'] > 0] = "increasing"
Run Code Online (Sandbox Code Playgroud)
它应该是:
df_categorized.loc[df_categorized['avg change'] > 0, 'category'] = "increasing"
Run Code Online (Sandbox Code Playgroud)
请参阅docs