Saa*_*dal 3 matplotlib python-3.x seaborn
我有一组 x 和 y 坐标。我想绘制它们并找出哪个区域被密集绘制。我想找到该区域中心的坐标。
我写了一个代码来绘制 xy 平面中的点。
import numpy as np
import matplotlib.pyplot as plt
x = np.array([111,152,153,14,155,156,153,154,153,152,154,151,200,201,200])
y = np.array([112,151,153,12,153,154,155,153,152,153,152,153,202,202,204])
plt.scatter(x,y)
plt.show()
Run Code Online (Sandbox Code Playgroud)
但是,我无法找出密集绘制区域的中心。有人可以帮我吗?
首先,您需要区分(并确定)密度和聚类。你的问题没有明确定义你想要什么,所以我假设你想评估找到最大密度的点之间的位置。老实说,我正在做一些令人耳目一新的任务,因此在这里选择了密度选项;)
由于有多种方法可以评估/估计/计算点云密度,因此单独的术语密度未详细说明。我建议使用 KDE(核密度估计器),因为它很容易满足一个人的需要,并允许交换核(平方、线性、高斯、余弦和任何其他适用的)。
注意:完整代码如下;我现在将用更小的片段逐步完成它。
从您提供的点开始,以下代码段在您的笛卡尔坐标中计算适当的均匀间隔网格:
RESOLUTION = 50
LOCALITY = 2.0
dx = max(pts_x) - min(pts_x)
dy = max(pts_y) - min(pts_y)
delta = min(dx, dy) / RESOLUTION
nx = int(dx / delta)
ny = int(dy / delta)
radius = (1 / LOCALITY) * min(dx, dy)
grid_x = np.linspace(min(pts_x), max(pts_x), num=nx)
grid_y = np.linspace(min(pts_y), max(pts_y), num=ny)
x, y = np.meshgrid(grid_x, grid_y)
Run Code Online (Sandbox Code Playgroud)
plt.scatter(grid_x, grid_y)如果您愿意,您可以通过绘图轻松检查它。所有这些初步计算只是确保手头有所有必需的值:我希望能够为 KDE 内核传播指定某种分辨率和位置设置;此外,我们需要网格生成的最大点距 inx和ydirection,计算步长delta,计算 x 和 y 方向的网格单元数,nx以及ny,以及radius根据我们的局部设置计算内核。
为了使用内核估计点密度,我们确实需要下一个片段中概述的两个函数
def gauss(x1, x2, y1, y2):
"""
Apply a Gaussian kernel estimation (2-sigma) to distance between points.
Effectively, this applies a Gaussian kernel with a fixed radius to one
of the points and evaluates it at the value of the euclidean distance
between the two points (x1, y1) and (x2, y2).
The Gaussian is transformed to roughly (!) yield 1.0 for distance 0 and
have the 2-sigma located at radius distance.
"""
return (
(1.0 / (2.0 * math.pi))
* math.exp(
-1 * (3.0 * math.sqrt((x1 - x2)**2 + (y1 - y2)**2) / radius))**2
/ 0.4)
def _kde(x, y):
"""
Estimate the kernel density at a given position.
Simply sums up all the Gaussian kernel values towards all points
(pts_x, pts_y) from position (x, y).
"""
return sum([
gauss(x, px, y, py)
# math.sqrt((x - px)**2 + (y - py)**2)
for px, py in zip(pts_x, pts_y)
])
Run Code Online (Sandbox Code Playgroud)
第一个,gauss一次执行两个任务:它需要由 定义的两个点x1, x2, y1, y2,计算它们的欧几里德距离并使用该距离来评估高斯核的函数值。1.0当距离为0(或非常小)时,高斯核被转换为近似屈服,并将其 2-sigma 固定为先前计算的radius。该特性由上述locality设置控制。
幸运的是,numpy提供了一些简洁的辅助函数来将任意 Python 函数应用于向量和矩阵,因此计算非常简单:
kde = np.vectorize(_kde) # Let numpy care for applying our kde to a vector
z = kde(x, y)
xi, yi = np.where(z == np.amax(z))
max_x = grid_x[xi][0]
max_y = grid_y[yi][0]
print(f"{max_x:.4f}, {max_y:.4f}")
Run Code Online (Sandbox Code Playgroud)
在您的情况下(考虑到高斯内核设置和我的网格假设),最大密度为
155.2041, 154.0800
Run Code Online (Sandbox Code Playgroud)
您的点云(蓝色十字)与识别的最大值(红色十字)显示在第一张图片中。第二张图显示了高斯 KDE 使用第一个代码片段中的设置计算的估计密度。
import math
import matplotlib.pyplot as plt
import numpy as np
pts_x = np.array([
111, 152, 153, 14, 155, 156, 153, 154, 153, 152, 154, 151, 200, 201, 200])
pts_y = np.array([
112, 151, 153, 12, 153, 154, 155, 153, 152, 153, 152, 153, 202, 202, 204])
RESOLUTION = 50
LOCALITY = 2.0
dx = max(pts_x) - min(pts_x)
dy = max(pts_y) - min(pts_y)
delta = min(dx, dy) / RESOLUTION
nx = int(dx / delta)
ny = int(dy / delta)
radius = (1 / LOCALITY) * min(dx, dy)
grid_x = np.linspace(min(pts_x), max(pts_x), num=nx)
grid_y = np.linspace(min(pts_y), max(pts_y), num=ny)
x, y = np.meshgrid(grid_x, grid_y)
def gauss(x1, x2, y1, y2):
"""
Apply a Gaussian kernel estimation (2-sigma) to distance between points.
Effectively, this applies a Gaussian kernel with a fixed radius to one
of the points and evaluates it at the value of the euclidean distance
between the two points (x1, y1) and (x2, y2).
The Gaussian is transformed to roughly (!) yield 1.0 for distance 0 and
have the 2-sigma located at radius distance.
"""
return (
(1.0 / (2.0 * math.pi))
* math.exp(
-1 * (3.0 * math.sqrt((x1 - x2)**2 + (y1 - y2)**2) / radius))**2
/ 0.4)
def _kde(x, y):
"""
Estimate the kernel density at a given position.
Simply sums up all the Gaussian kernel values towards all points
(pts_x, pts_y) from position (x, y).
"""
return sum([
gauss(x, px, y, py)
# math.sqrt((x - px)**2 + (y - py)**2)
for px, py in zip(pts_x, pts_y)
])
kde = np.vectorize(_kde) # Let numpy care for applying our kde to a vector
z = kde(x, y)
xi, yi = np.where(z == np.amax(z))
max_x = grid_x[xi][0]
max_y = grid_y[yi][0]
print(f"{max_x:.4f}, {max_y:.4f}")
fig, ax = plt.subplots()
ax.pcolormesh(x, y, z, cmap='inferno', vmin=np.min(z), vmax=np.max(z))
fig.set_size_inches(4, 4)
fig.savefig('density.png', bbox_inches='tight')
fig, ax = plt.subplots()
ax.scatter(pts_x, pts_y, marker='+', color='blue')
ax.scatter(grid_x[xi], grid_y[yi], marker='+', color='red', s=200)
fig.set_size_inches(4, 4)
fig.savefig('marked.png', bbox_inches='tight')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
898 次 |
| 最近记录: |