Jer*_*emy 14 python numpy dataframe
我目前正在将数据读入一个看起来像这样的数据框.
City XCord YCord
Boston 5 2
Phoenix 7 3
New York 8 1
..... . .
Run Code Online (Sandbox Code Playgroud)
我想根据这些数据创建欧几里德距离矩阵,显示所有城市对之间的距离,这样我得到一个结果矩阵,如:
Boston Phoenix New York
Boston 0 2.236 3.162
Phoenix 2.236 0 2.236
New York 3.162 2.236 0
Run Code Online (Sandbox Code Playgroud)
在我的实际数据框架中有更多的城市和坐标,所以我需要能够以某种方式迭代所有城市对并创建一个像我上面所示的距离矩阵,但我不知道如何配对所有引用并引用欧几里德距离公式?任何帮助,将不胜感激.
And*_*rew 20
我想你对distance_matrix很感兴趣.
例如:
创建数据:
import pandas as pd
from scipy.spatial import distance_matrix
data = [[5, 7], [7, 3], [8, 1]]
ctys = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)
Run Code Online (Sandbox Code Playgroud)
输出:
xcord ycord
Boston 5 7
Phoenix 7 3
New York 8 1
Run Code Online (Sandbox Code Playgroud)
使用距离矩阵函数:
pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)
Run Code Online (Sandbox Code Playgroud)
结果:
Boston Phoenix New York
Boston 0.000000 4.472136 6.708204
Phoenix 4.472136 0.000000 2.236068
New York 6.708204 2.236068 0.000000
Run Code Online (Sandbox Code Playgroud)
如果您不想使用scipy,则可以通过以下方式利用列表理解:
dist = lambda p1, p2: sqrt(((p1-p2)**2).sum())
dm = np.asarray([[dist(p1, p2) for p2 in xy_list] for p1 in xy_list])
Run Code Online (Sandbox Code Playgroud)
我将在纯python中给出一个方法。
从数学模块导入sqrt函数:
from math import sqrt
假设您通过以下方式在Cords表中拥有坐标:
cords['Boston'] = (5, 2)
定义一个函数来计算两个给定2d点的欧几里得距离:
def dist(a, b):
d = [a[0] - b[0], a[1] - b[1]]
return sqrt(d[0] * d[0] + d[1] * d[1])
Run Code Online (Sandbox Code Playgroud)
将结果矩阵初始化为字典:
D = {}
for city1, cords1 in cords.items():
D[city1] = {}
for city2, cords2 in cords.items():
D[city1][city2] = dist(cords1, cords2)
Run Code Online (Sandbox Code Playgroud)
D是结果矩阵
以下是完整的来源以及打印结果:
from math import sqrt
cords = {}
cords['Boston'] = (5, 2)
cords['Phoenix'] = (7, 3)
cords['New York'] = (8, 1)
def dist(a, b):
d = [a[0] - b[0], a[1] - b[1]]
return sqrt(d[0] * d[0] + d[1] * d[1])
D = {}
for city1, cords1 in cords.items():
D[city1] = {}
for city2, cords2 in cords.items():
D[city1][city2] = dist(cords1, cords2)
for city1, v in D.items():
for city2, d in v.items():
print city1, city2, d
Run Code Online (Sandbox Code Playgroud)
结果:
Boston Boston 0.0
Boston New York 3.16227766017
Boston Phoenix 2.2360679775
New York Boston 3.16227766017
New York New York 0.0
New York Phoenix 2.2360679775
Phoenix Boston 2.2360679775
Phoenix New York 2.2360679775
Phoenix Phoenix 0.0
Run Code Online (Sandbox Code Playgroud)