pat*_*998 3 python charts dataframe plotly plotly-python
我正在使用 python 和plotly 来设计我正在使用的数据集中某些类别的平均评级的条形图。我得到的条形图几乎是我想要的,但是我想更改图中每个特定条形的颜色,但似乎无法找到如何在线执行此操作的明确方法。
数据集
from pandas import Timestamp
pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'overall_rating': {0: 5, 1: 4, 2: 5, 3: 5, 4: 4},
'user_name': {0: 'member1365952',
1: 'member465943',
2: 'member665924',
3: 'member865886',
4: 'member1065873'},
'date': {0: Timestamp('2022-02-03 00:00:00'),
1: Timestamp('2022-02-03 00:00:00'),
2: Timestamp('2022-02-02 00:00:00'),
3: Timestamp('2022-02-01 00:00:00'),
4: Timestamp('2022-02-01 00:00:00')},
'comments': {0: 'Great campus. Library is always helpful. Sport course has been brill despite Civid challenges.',
1: 'Average facilities and student Union. Great careers support.',
2: 'Brilliant university, very social place with great unions.',
3: 'Overall it was very good and the tables and chairs for discussion sessions worked very well',
4: 'Uni is nice and most of the staff are amazing. Facilities (particularly the library) could be better'},
'campus_facilities_rating': {0: 5, 1: 3, 2: 5, 3: 4, 4: 4},
'clubs_societies_rating': {0: 5, 1: 3, 2: 4, 3: 4, 4: 4},
'students_union_rating': {0: 4, 1: 3, 2: 5, 3: 5, 4: 5},
'careers_service_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3},
'wifi_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3}})
Run Code Online (Sandbox Code Playgroud)
使用的代码
# Plot to find mean rating for different categories
fig = px.bar(df, y=[df.campus_facilities_rating.mean(), df.clubs_societies_rating.mean(),
df.students_union_rating.mean(), df.careers_service_rating.mean(), df.wifi_rating.mean()],
x=['Campus Facilities', 'Clubs & Societies', 'Students Union', 'Careers & Services', 'Wifi'],
labels={
"y": "Mean Rating (1-5)",
"x": "Category"},
title="Mean Rating For Different Student Categories")
fig.show()
Run Code Online (Sandbox Code Playgroud)
更新的尝试
# Plot to find mean rating for different categories
fig = px.bar(df, y=[df.campus_facilities_rating.mean(), df.clubs_societies_rating.mean(),
df.students_union_rating.mean(), df.careers_service_rating.mean(), df.wifi_rating.mean()],
x=['Campus Facilities', 'Clubs & Societies', 'Students Union', 'Careers & Services', 'Wifi'],
labels={
"y": "Mean Rating (1-5)",
"x": "Category"},
title="Mean Rating For Different Student Categories At The University of Lincoln",
color_discrete_map = {
'Campus Facilities' : 'red',
'Clubs & Societies' : 'blue',
'Students Union' : 'pink',
'Careers & Services' : 'grey',
'Wifi' : 'orange'})
fig.update_layout(barmode = 'group')
fig.show()
Run Code Online (Sandbox Code Playgroud)
输出只是将所有条形显示为蓝色。
一般来说,如果您定义了如下所示的类别,则可以使用color_discrete_mapinpx.bar()来指定每个条形的颜色:color="medal"
color_discrete_map={'gold':'yellow', 'silver':'grey', 'bronze':'brown'}
Run Code Online (Sandbox Code Playgroud)
import plotly.express as px
long_df = px.data.medals_long()
fig = px.bar(long_df, x="nation", y="count", color="medal", title="color_discrete_map={'gold':'yellow', 'silver':'grey', 'bronze':'brown'}",
color_discrete_map={'gold':'yellow', 'silver':'grey', 'bronze':'brown'})
fig.update_layout(barmode = 'group')
fig.show()
Run Code Online (Sandbox Code Playgroud)
对于您的特定数据集和结构,您无法直接应用color='category,因为不同的类别分布在多个列中,如下所示:
有一种方法可以使用go.Figure()和来实现您的目标fig.add_traces(),但由于您似乎对 最感兴趣px.bar(),因此我们将坚持使用plotly.express。简而言之,go.Figure()不需要特定的数据争论就能得到你想要的东西,但设置数字会有点混乱。当涉及到plotly.express和 时px.bar,情况恰恰相反。一旦我们对您的数据集进行了一些更改,您只需使用以下代码片段即可构建下图:
fig = px.bar(dfg, x = 'category', y = 'value',
color = 'category',
category_orders = {'category':['Campus Facilities','Clubs & Societies','Students Union','Careers & Services','Wifi']},
color_discrete_map = {'Campus Facilities' : 'red',
'Clubs & Societies' : 'blue',
'Students Union' : 'pink',
'Careers & Services' : 'grey',
'Wifi' : 'orange'})
Run Code Online (Sandbox Code Playgroud)
from pandas import Timestamp
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'overall_rating': {0: 5, 1: 4, 2: 5, 3: 5, 4: 4},
'user_name': {0: 'member1365952',
1: 'member465943',
2: 'member665924',
3: 'member865886',
4: 'member1065873'},
'date': {0: Timestamp('2022-02-03 00:00:00'),
1: Timestamp('2022-02-03 00:00:00'),
2: Timestamp('2022-02-02 00:00:00'),
3: Timestamp('2022-02-01 00:00:00'),
4: Timestamp('2022-02-01 00:00:00')},
'comments': {0: 'Great campus. Library is always helpful. Sport course has been brill despite Civid challenges.',
1: 'Average facilities and student Union. Great careers support.',
2: 'Brilliant university, very social place with great unions.',
3: 'Overall it was very good and the tables and chairs for discussion sessions worked very well',
4: 'Uni is nice and most of the staff are amazing. Facilities (particularly the library) could be better'},
'campus_facilities_rating': {0: 5, 1: 3, 2: 5, 3: 4, 4: 4},
'clubs_societies_rating': {0: 5, 1: 3, 2: 4, 3: 4, 4: 4},
'students_union_rating': {0: 4, 1: 3, 2: 5, 3: 5, 4: 5},
'careers_service_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3},
'wifi_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3}})
df.columns = ['id', 'overall_rating', 'user_name', 'date', 'comments', 'Campus Facilities',
'Clubs & Societies','Students Union','Careers & Services','Wifi']
dfm = pd.melt(df, id_vars=['id', 'overall_rating', 'user_name', 'date', 'comments'],
value_vars=list(df.columns[5:]),
var_name ='category')
dfg = dfm.groupby(['category']).mean().reset_index()
fig = px.bar(dfg, x = 'category', y = 'value', color = 'category',
category_orders = {'category':['Campus Facilities','Clubs & Societies','Students Union','Careers & Services','Wifi']},
color_discrete_map = {
'Campus Facilities' : 'red',
'Clubs & Societies' : 'blue',
'Students Union' : 'pink',
'Careers & Services' : 'grey',
'Wifi' : 'orange'})
fig.update_yaxes(title = 'Mean rating (1-5)')
fig.show()
Run Code Online (Sandbox Code Playgroud)
dfm和dfg?px.bar(color = 'variable')为名为 的系列或 pandas 列的唯一出现分配颜色'variable'。但我们对您的数据框感兴趣的类别分布在多个列中。所以呢
dfm = pd.melt(df, id_vars=['id', 'overall_rating', 'user_name', 'date', 'comments'],
value_vars=list(df.columns[5:]),
var_name ='category')
Run Code Online (Sandbox Code Playgroud)
就是取以下列:
并将它们堆叠成一列,命名variable如下:
但这仍然是原始数据,您对此不感兴趣,而是对同一列中每个组的平均值感兴趣。这就是
dfm.groupby(['category']).mean().reset_index()
Run Code Online (Sandbox Code Playgroud)
给我们:
请查看pd.melt()和df.groupby()了解更多详细信息。
| 归档时间: |
|
| 查看次数: |
7615 次 |
| 最近记录: |