用散景或matplotlib绘制主题

sb3*_*134 13 python data-visualization matplotlib bokeh

我正在尝试从模型中绘制主题可视化.我想做散景协方差实现.

我的数据是:

data 1: index,                            topics.   
data 2: index, topics, weights(use it for color). 
Run Code Online (Sandbox Code Playgroud)

主题只是一组单词.

如何将数据提供给散景图以绘制上述数据?从示例中,数据处理不直观.

使用matplot,它看起来像这样.
显然,看到哪个主题对应于每个圆圈在视觉上没有帮助.这是我的matplotlib代码:

x = []
y = []
area = []

for row in joined:
      x.append(row['index']) 
      y.append(row['index'])
      #weight.append(row['score'])
      area.append(np.pi * (15 * row['score'])**2)
scale_values = 1000
plt.scatter(x, y, s=scale_values*np.array(area), alpha=0.5)
plt.show()
Run Code Online (Sandbox Code Playgroud)

有什么想法/建议?

big*_*dot 15

更新:以下答案在所有主要观点中仍然是正确的,但是从Bokeh 0.7开始,API略有变化,更加明确.一般来说,比如:

rect(...)
Run Code Online (Sandbox Code Playgroud)

应该换成

p = figure(...)
p.rect(...)
Run Code Online (Sandbox Code Playgroud)

以下是来自Les Mis示例的相关行,简化为您的案例.让我们来看看:

# A "ColumnDataSource" is like a dict, it maps names to columns of data.
# These names are not special we can call the columns whatever we like.
source = ColumnDataSource(
    data=dict(
        x = [row['name'] for row in joined],
        y = [row['name'] for row in joined],
        color = list_of_colors_one_for_each_row, 
    )
)

# We need a list of the categorical coordinates
names = list(set(row['name'] for row in joined))

# rect takes center coords (x,y) and width and height. We will draw 
# one rectangle for each row.
rect('x', 'y',        # use the 'x' and 'y' fields from the data source
     0.9, 0.9,        # use 0.9 for both width and height of each rectangle 
     color = 'color', # use the 'color' field to set the color
     source = source, # use the data source we created above
     x_range = names, # sequence of categorical coords for x-axis
     y_range = names, # sequence of categorical coords for y-axis
)
Run Code Online (Sandbox Code Playgroud)

几点说明:

  • 对于数字数据x_range,y_range通常会自动提供.我们必须在这里明确地给出它们,因为我们使用的是分类坐标.

  • 您可以按名称顺序进行列表x_rangey_range你喜欢的,这是它们显示在图上轴的顺序.

  • 我假设你想使用分类坐标.:)这就是Les Mes的例子.如果需要数字坐标,请参阅本答案的底部.

  • 有关更多信息,请参阅http://bokeh.pydata.org/tutorial/index.html上的Bokeh教程

此外,Les Mis示例稍微复杂一些(它有一个悬停工具),这就是我们手工创建ColumnDataSource的原因.如果您只需要一个简单的绘图,您可以自己跳过创建数据源,并直接将数据传递到rect:

names = list(set(row['name'] for row in joined))

rect(names,    # x (categorical) coordinate for each rectangle
     names,    # y (categorical) coordinate for each rectangle
     0.9, 0.9, # use 0.9 for both width and height of each rectangle
     color = some_colors, # color for each rect
     x_range = names, # sequence of categorical coords for x-axis
     y_range = names, # sequence of categorical coords for y-axis
)
Run Code Online (Sandbox Code Playgroud)

另一个注意事项:这只绘制对角线上的矩形,其中x坐标和y坐标相同.这似乎是你想要的描述.但为了完整性,可以绘制具有不同x坐标和y坐标的矩形.Les Mis的例子就是这样.

最后,也许你真的不想要分类轴?如果你只想使用坐标的数字索引,它甚至更简单:

inds = [row['index'] for row in joined]

rect(inds,    # x-coordinate for each rectangle
     inds,    # y-coordinate for each rectangle
     0.9, 0.9, # use 0.9 for both width and height of each rectangle
     color = some_colors, # color for each rect
)
Run Code Online (Sandbox Code Playgroud)

编辑:这是一个使用数字坐标的完整可运行示例:

from bokeh.plotting import * 

output_file("foo.html")

inds = [2, 5, 6, 8, 9]
colors = ["red", "orange", "blue", "green", "#4488aa"]

rect(inds, inds, 1.0, 1.0, color=colors)

show()
Run Code Online (Sandbox Code Playgroud)

这里使用与分类坐标相同的值:

from bokeh.plotting import * 

output_file("foo.html")

inds = [str(x) for x in [2, 5, 6, 8, 9]]
colors = ["red", "orange", "blue", "green", "#4488aa"]

rect(inds, inds, 1.0, 1.0, color=colors, x_range=inds, y_range=inds)

show()
Run Code Online (Sandbox Code Playgroud)

  • 嗨@ sb32134,我注意到你的代码中有几个错误阻止了Bokeh完全呈现你想要的东西.您的`颜色'列表是正确生成的,但是您正在构建的`inds`索引与您希望在绘图上显示的类别之间存在区别.我已经建立了一个IPython笔记本,希望澄清这个问题:https://www.wakari.io/sharing/bundle/kpsfire/Categorical希望有所帮助! (2认同)