Wou*_*ckx 5 python line histogram vega-lite altair
我正在尝试尽可能多地使用 Altair 重现此图表。 https://fivethirtyeight.com/wp-content/uploads/2014/04/hickey-bechdel-11.png?w=575
我坚持让黑线划分通过/失败。这类似于这个 Altair 示例:https : //altair-viz.github.io/gallery/step_chart.html。但是:在 538 viz 中,最终日期的值必须扩展到最后一个元素的整个宽度。在步骤图示例和我的解决方案中,只要遇到最后一个日期元素,该行就会停止。
我查看了 altair 的 github 和 google 组,没有发现与此问题类似的内容。
import altair as alt
import pandas as pd
movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']
base=alt.Chart(movies).encode(
alt.X("year:N",bin=alt.BinParams(step=5,extent=[1970,2015]),axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),title=None), alt.Y("count()",stack='normalize',title=None,axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1]))
).properties(width=400)
main=base.transform_calculate(cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
).mark_bar(stroke='white' #add horizontal lines
).encode(
alt.Color("clean_test:N",scale=alt.Scale(
domain=domain,
range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
,order=alt.Order('cleanrank:O', sort='ascending')
)
extra=base.transform_calculate(cleanpass='datum.clean_test == "ok" ? "PASS" : datum.clean_test == "dubious" ? "PASS" : "FAIL"'
).mark_line(interpolate='step-after'
).encode(alt.Color("cleanpass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
)
alt.layer(main,extra).configure_scale(
bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')
Run Code Online (Sandbox Code Playgroud)
使步骤图覆盖第一个 bin 的开头直到最后一个 bin 的末尾的一种相当老套的方法是手动控制 bin 位置(使用有序 bin 的排名)。
这样我们就可以添加两行:一行移动了一个 bin,'step-after'另一行step-before移动了一个 bin。从这里开始,刻度标签仍然需要替换并以适当的箱标签居中,例如来自pd.cut...的级别
import altair as alt
import pandas as pd
movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']
movies['year_bin'] = pd.cut(movies['year'], range(1970, 2016, 5))
movies['year_rank'] = movies['year_bin'].cat.codes
movies = movies[movies['year_rank']>=0]
df_plot = movies[['year_rank', 'clean_test']].copy()
df_plot['year_rank_end'] = df_plot['year_rank'] + 1
df_plot['clean_pass'] = df_plot['clean_test'].apply(lambda x: 'PASS' if x in ['ok', 'dubious'] else 'FAIL')
Run Code Online (Sandbox Code Playgroud)
base=alt.Chart(df_plot).encode(
x=alt.X('year_rank',
axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),
title=None
),
x2='year_rank_end',
y=alt.Y('count()',title=None, stack='normalize',
axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1])
)
).properties(width=400)
main=base.transform_calculate(
cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
).mark_bar(
stroke='white' #add horizontal lines
).encode(
alt.Color("clean_test:N",scale=alt.Scale(
domain=domain,
range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
,order=alt.Order('cleanrank:O', sort='ascending')
)
extra=base.transform_calculate(
).mark_line(
interpolate='step-after'
).encode(
alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
)
extra2=base.transform_calculate(
# shift data by one bin, so that step-before matches the unshifted step-after
year_rank='datum.year_rank +1'
).mark_line(
interpolate='step-before'
).encode(
alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']), legend=None)
)
alt.layer(main, extra, extra2).configure_scale(
bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
234 次 |
| 最近记录: |