296*_*502 66 python matplotlib
我想制作一个散点图,其中每个点都由附近点的空间密度着色.
我遇到了一个非常相似的问题,它使用R显示了这个例子:
使用matplotlib在python中实现类似功能的最佳方法是什么?
Joe*_*ton 130
除了hist2d或hexbin作为@askewchan建议,您可以使用在问题接受答案您链接到使用相同的方法.
如果你想这样做:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100, edgecolor='')
plt.show()
Run Code Online (Sandbox Code Playgroud)

如果您希望以密度的顺序绘制点,以便最密集的点始终位于顶部(类似于链接的示例),只需按z值对它们进行排序即可.我也会在这里使用较小的标记尺寸,因为它看起来好一点:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50, edgecolor='')
plt.show()
Run Code Online (Sandbox Code Playgroud)

np8*_*np8 32
该接受的答案,使用gaussian_kde()将花费大量的时间。在我的机器上,10 万行大约需要11 分钟。在这里,我将添加两种替代方法(mpl-scatter-density和datashader)并将给定的答案与相同的数据集进行比较。
在下面,我使用了一个 10 万行的测试数据集:
import matplotlib.pyplot as plt
import numpy as np
# Fake data for testing
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
Run Code Online (Sandbox Code Playgroud)
下面是不同方法的比较。
1: mpl-scatter-density安装
pip install mpl-scatter-density
Run Code Online (Sandbox Code Playgroud)
示例代码
pip install mpl-scatter-density
Run Code Online (Sandbox Code Playgroud)
2: datashaderpip install "git+https://github.com/nvictus/datashader.git@mpl"
Run Code Online (Sandbox Code Playgroud)
代码(这里是 dsshow 的来源):
import mpl_scatter_density # adds projection='scatter_density'
from matplotlib.colors import LinearSegmentedColormap
# "Viridis-like" colormap with white background
white_viridis = LinearSegmentedColormap.from_list('white_viridis', [
(0, '#ffffff'),
(1e-20, '#440053'),
(0.2, '#404388'),
(0.4, '#2a788e'),
(0.6, '#21a784'),
(0.8, '#78d151'),
(1, '#fde624'),
], N=256)
def using_mpl_scatter_density(fig, x, y):
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y, cmap=white_viridis)
fig.colorbar(density, label='Number of points per pixel')
fig = plt.figure()
using_mpl_scatter_density(fig, x, y)
plt.show()
Run Code Online (Sandbox Code Playgroud)
放大的图像看起来很棒!
3: scatter_with_gaussian_kdepip install "git+https://github.com/nvictus/datashader.git@mpl"
Run Code Online (Sandbox Code Playgroud)
4: using_hist2dfrom functools import partial
import datashader as ds
from datashader.mpl_ext import dsshow
import pandas as pd
dyn = partial(ds.tf.dynspread, max_px=40, threshold=0.5)
def using_datashader(ax, x, y):
df = pd.DataFrame(dict(x=x, y=y))
da1 = dsshow(df, ds.Point('x', 'y'), spread_fn=dyn, aspect='auto', ax=ax)
plt.colorbar(da1)
fig, ax = plt.subplots()
using_datashader(ax, x, y)
plt.show()
Run Code Online (Sandbox Code Playgroud)


5: density_scatterask*_*han 29
你可以做一个直方图:
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()
Run Code Online (Sandbox Code Playgroud)

Gui*_*ume 15
另外,如果点数使KDE计算太慢,则可以在np.histogram2d中插入颜色:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interpn
def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs ) :
"""
Scatter plot colored by 2d histogram
"""
if ax is None :
fig , ax = plt.subplots()
data , x_e, y_e = np.histogram2d( x, y, bins = bins)
z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False )
# Sort the points by density, so that the densest points are plotted last
if sort :
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
ax.scatter( x, y, c=z, **kwargs )
return ax
if "__main__" == __name__ :
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
density_scatter( x, y, bins = [30,30] )
Run Code Online (Sandbox Code Playgroud)