che*_*ens 16 python pivot-table data-analysis dataframe pandas
我有以下数据框(真正的数据框比这个更大):
sale_user_id sale_product_id count
1 1 1
1 8 1
1 52 1
1 312 5
1 315 1
Run Code Online (Sandbox Code Playgroud)
然后重新塑造它以使用以下代码将sale_product_id中的值作为列标题移动:
reshaped_df=id_product_count.pivot(index='sale_user_id',columns='sale_product_id',values='count')
Run Code Online (Sandbox Code Playgroud)
结果数据框是:
sale_product_id -1057 1 2 3 4 5 6 8 9 10 ... 98 980 981 982 983 984 985 986 987 99
sale_user_id
1 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,我们有一个多级索引,我需要的是在没有多级索引的情况下在第一列中使用sale_user_is:
我采取以下方法:
reshaped_df.reset_index()
Run Code Online (Sandbox Code Playgroud)
结果将是这样的,我仍然有sale_product_id列,但我不再需要它了:
sale_product_id sale_user_id -1057 1 2 3 4 5 6 8 9 ... 98 980 981 982 983 984 985 986 987 99
0 1 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 4 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
我可以将这个数据框子集化以摆脱sale_product_id,但我认为它不会有效.我正在寻找一种有效的方法来摆脱多级索引,同时重塑原始数据框架
jez*_*ael 17
您只需要删除index name,使用rename_axis(新的pandas 0.18.0):
print (reshaped_df)
sale_product_id 1 8 52 312 315
sale_user_id
1 1 1 1 5 1
print (reshaped_df.index.name)
sale_user_id
print (reshaped_df.rename_axis(None))
sale_product_id 1 8 52 312 315
1 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
下面的熊猫工作的另一个解决方案0.18.0:
reshaped_df.index.name = None
print (reshaped_df)
sale_product_id 1 8 52 312 315
1 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
如果columns name还需要删除:
print (reshaped_df.columns.name)
sale_product_id
print (reshaped_df.rename_axis(None).rename_axis(None, axis=1))
1 8 52 312 315
1 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
另一种方案:
reshaped_df.columns.name = None
reshaped_df.index.name = None
print (reshaped_df)
1 8 52 312 315
1 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
编辑评论:
你需要reset_index带参数drop=True:
reshaped_df = reshaped_df.reset_index(drop=True)
print (reshaped_df)
sale_product_id 1 8 52 312 315
0 1 1 1 5 1
#if need reset index nad remove column name
reshaped_df = reshaped_df.reset_index(drop=True).rename_axis(None, axis=1)
print (reshaped_df)
1 8 52 312 315
0 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
如果需要只删除列名称:
reshaped_df = reshaped_df.rename_axis(None, axis=1)
print (reshaped_df)
1 8 52 312 315
sale_user_id
1 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
EDIT1:
因此,如果需要创建新列index并删除columns names:
reshaped_df = reshaped_df.rename_axis(None, axis=1).reset_index()
print (reshaped_df)
sale_user_id 1 8 52 312 315
0 1 1 1 1 5 1
Run Code Online (Sandbox Code Playgroud)
Chr*_*ger 10
制作一个数据框
import random
d = {'Country': ['Afghanistan','Albania','Algeria','Andorra','Angola']*2,
'Year': [2005]*5 + [2006]*5, 'Value': random.sample(range(1,20),10)}
df = pd.DataFrame(data=d)
Run Code Online (Sandbox Code Playgroud)
df:
Country Year Value
1 Afghanistan 2005 6
2 Albania 2005 13
3 Algeria 2005 10
4 Andorra 2005 11
5 Angola 2005 5
6 Afghanistan 2006 3
7 Albania 2006 2
8 Algeria 2006 7
9 Andorra 2006 3
10 Angola 2006 6
Run Code Online (Sandbox Code Playgroud)
枢
table = df.pivot(index='Country',columns='Year',values='Value')
Run Code Online (Sandbox Code Playgroud)
桌子:
Year Country 2005 2006
0 Afghanistan 16 9
1 Albania 17 19
2 Algeria 11 7
3 Andorra 5 12
4 Angola 6 18
Run Code Online (Sandbox Code Playgroud)
我希望“年份”成为“索引”:
clean_tbl = table.rename_axis(None, axis=1).reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)
干净的表:
Country 2005 2006
0 Afghanistan 16 9
1 Albania 17 19
2 Algeria 11 7
3 Andorra 5 12
4 Angola 6 18
Run Code Online (Sandbox Code Playgroud)
完毕!
| 归档时间: |
|
| 查看次数: |
13017 次 |
| 最近记录: |