sb3*_*134 13 python data-visualization matplotlib bokeh
我正在尝试从模型中绘制主题可视化.我想做散景协方差实现.
我的数据是:
data 1: index, topics.
data 2: index, topics, weights(use it for color).
Run Code Online (Sandbox Code Playgroud)
主题只是一组单词.
如何将数据提供给散景图以绘制上述数据?从示例中,数据处理不直观.
使用matplot,它看起来像这样.
显然,看到哪个主题对应于每个圆圈在视觉上没有帮助.这是我的matplotlib代码:
x = []
y = []
area = []
for row in joined:
x.append(row['index'])
y.append(row['index'])
#weight.append(row['score'])
area.append(np.pi * (15 * row['score'])**2)
scale_values = 1000
plt.scatter(x, y, s=scale_values*np.array(area), alpha=0.5)
plt.show()
Run Code Online (Sandbox Code Playgroud)
有什么想法/建议?
big*_*dot 15
更新:以下答案在所有主要观点中仍然是正确的,但是从Bokeh 0.7开始,API略有变化,更加明确.一般来说,比如:
rect(...)
Run Code Online (Sandbox Code Playgroud)
应该换成
p = figure(...)
p.rect(...)
Run Code Online (Sandbox Code Playgroud)
以下是来自Les Mis示例的相关行,简化为您的案例.让我们来看看:
# A "ColumnDataSource" is like a dict, it maps names to columns of data.
# These names are not special we can call the columns whatever we like.
source = ColumnDataSource(
data=dict(
x = [row['name'] for row in joined],
y = [row['name'] for row in joined],
color = list_of_colors_one_for_each_row,
)
)
# We need a list of the categorical coordinates
names = list(set(row['name'] for row in joined))
# rect takes center coords (x,y) and width and height. We will draw
# one rectangle for each row.
rect('x', 'y', # use the 'x' and 'y' fields from the data source
0.9, 0.9, # use 0.9 for both width and height of each rectangle
color = 'color', # use the 'color' field to set the color
source = source, # use the data source we created above
x_range = names, # sequence of categorical coords for x-axis
y_range = names, # sequence of categorical coords for y-axis
)
Run Code Online (Sandbox Code Playgroud)
几点说明:
对于数字数据x_range,y_range通常会自动提供.我们必须在这里明确地给出它们,因为我们使用的是分类坐标.
您可以按名称顺序进行列表x_range和y_range你喜欢的,这是它们显示在图上轴的顺序.
我假设你想使用分类坐标.:)这就是Les Mes的例子.如果需要数字坐标,请参阅本答案的底部.
有关更多信息,请参阅http://bokeh.pydata.org/tutorial/index.html上的Bokeh教程
此外,Les Mis示例稍微复杂一些(它有一个悬停工具),这就是我们手工创建ColumnDataSource的原因.如果您只需要一个简单的绘图,您可以自己跳过创建数据源,并直接将数据传递到rect:
names = list(set(row['name'] for row in joined))
rect(names, # x (categorical) coordinate for each rectangle
names, # y (categorical) coordinate for each rectangle
0.9, 0.9, # use 0.9 for both width and height of each rectangle
color = some_colors, # color for each rect
x_range = names, # sequence of categorical coords for x-axis
y_range = names, # sequence of categorical coords for y-axis
)
Run Code Online (Sandbox Code Playgroud)
另一个注意事项:这只绘制对角线上的矩形,其中x坐标和y坐标相同.这似乎是你想要的描述.但为了完整性,可以绘制具有不同x坐标和y坐标的矩形.Les Mis的例子就是这样.
最后,也许你真的不想要分类轴?如果你只想使用坐标的数字索引,它甚至更简单:
inds = [row['index'] for row in joined]
rect(inds, # x-coordinate for each rectangle
inds, # y-coordinate for each rectangle
0.9, 0.9, # use 0.9 for both width and height of each rectangle
color = some_colors, # color for each rect
)
Run Code Online (Sandbox Code Playgroud)
编辑:这是一个使用数字坐标的完整可运行示例:
from bokeh.plotting import *
output_file("foo.html")
inds = [2, 5, 6, 8, 9]
colors = ["red", "orange", "blue", "green", "#4488aa"]
rect(inds, inds, 1.0, 1.0, color=colors)
show()
Run Code Online (Sandbox Code Playgroud)
这里使用与分类坐标相同的值:
from bokeh.plotting import *
output_file("foo.html")
inds = [str(x) for x in [2, 5, 6, 8, 9]]
colors = ["red", "orange", "blue", "green", "#4488aa"]
rect(inds, inds, 1.0, 1.0, color=colors, x_range=inds, y_range=inds)
show()
Run Code Online (Sandbox Code Playgroud)