Shu*_*m R 4 python numpy dataframe pandas
我有一个数据帧:
routeId latitude_value longitude_value
r1 28.210216 22.813209
r2 28.216103 22.496735
r3 28.161786 22.842318
r4 28.093110 22.807081
r5 28.220370 22.503500
r6 28.220370 22.503500
r7 28.220370 22.503500
Run Code Online (Sandbox Code Playgroud)
从这里我想生成一个像这样的数据帧df2:
routeId nearest
r1 r3 (for example)
r2 ... similarly for all the routes.
Run Code Online (Sandbox Code Playgroud)
我试图实现的逻辑是
对于每条路线,我应该找到所有其他路线的欧氏距离.并在routeId上迭代它.
有一个计算欧氏距离的功能.
dist = math.hypot(x2 - x1, y2 - y1)
Run Code Online (Sandbox Code Playgroud)
但我很困惑如何构建一个函数,我将传递一个数据帧,或使用.apply()
def get_nearest_route():
.....
return df2
Run Code Online (Sandbox Code Playgroud)
我们可以使用scipy.spatial.distance.cdist或多个for循环,然后用路由替换min并找到最接近的即
mat = scipy.spatial.distance.cdist(df[['latitude_value','longitude_value']],
df[['latitude_value','longitude_value']], metric='euclidean')
# If you dont want scipy, you can use plain python like
# import math
# mat = []
# for i,j in zip(df['latitude_value'],df['longitude_value']):
# k = []
# for l,m in zip(df['latitude_value'],df['longitude_value']):
# k.append(math.hypot(i - l, j - m))
# mat.append(k)
# mat = np.array(mat)
new_df = pd.DataFrame(mat, index=df['routeId'], columns=df['routeId'])
Run Code Online (Sandbox Code Playgroud)
输出 new_df
routeId r1 r2 r3 r4 r5 r6 r7
routeId
r1 0.000000 0.316529 0.056505 0.117266 0.309875 0.309875 0.309875
r2 0.316529 0.000000 0.349826 0.333829 0.007998 0.007998 0.007998
r3 0.056505 0.349826 0.000000 0.077188 0.343845 0.343845 0.343845
r4 0.117266 0.333829 0.077188 0.000000 0.329176 0.329176 0.329176
r5 0.309875 0.007998 0.343845 0.329176 0.000000 0.000000 0.000000
r6 0.309875 0.007998 0.343845 0.329176 0.000000 0.000000 0.000000
r7 0.309875 0.007998 0.343845 0.329176 0.000000 0.000000 0.000000
#Replace minimum distance with column name and not the minimum with `False`.
# new_df[new_df != 0].min(),0). This gives a mask matching minimum other than zero.
closest = np.where(new_df.eq(new_df[new_df != 0].min(),0),new_df.columns,False)
# Remove false from the array and get the column names as list .
df['close'] = [i[i.astype(bool)].tolist() for i in closest]
routeId latitude_value longitude_value close
0 r1 28.210216 22.813209 [r3]
1 r2 28.216103 22.496735 [r5, r6, r7]
2 r3 28.161786 22.842318 [r1]
3 r4 28.093110 22.807081 [r3]
4 r5 28.220370 22.503500 [r2]
5 r6 28.220370 22.503500 [r2]
6 r7 28.220370 22.503500 [r2]
Run Code Online (Sandbox Code Playgroud)
如果您不想忽略零,那么
# Store the array values in a variable
arr = new_df.values
# We dont want to find mimimum to be same point, so replace diagonal by nan
arr[np.diag_indices_from(new_df)] = np.nan
# Replace the non nan min with column name and otherwise with false
new_close = np.where(arr == np.nanmin(arr, axis=1)[:,None],new_df.columns,False)
# Get column names ignoring false.
df['close'] = [i[i.astype(bool)].tolist() for i in new_close]
routeId latitude_value longitude_value close
0 r1 28.210216 22.813209 [r3]
1 r2 28.216103 22.496735 [r5, r6, r7]
2 r3 28.161786 22.842318 [r1]
3 r4 28.093110 22.807081 [r3]
4 r5 28.220370 22.503500 [r6, r7]
5 r6 28.220370 22.503500 [r5, r7]
6 r7 28.220370 22.503500 [r5, r6]
Run Code Online (Sandbox Code Playgroud)
我建议使用scipy.spatial.distance中的pdist函数.
matrix = scipy.spatial.distance.pdist(df[['latitude_value', 'longitude_value']], metric='euclidean')
Run Code Online (Sandbox Code Playgroud)
将返回缩小的形状距离矩阵(n,),并计算所有成对距离.
然后你可以使用squareform得到方形成对距离矩阵:
matrix = scipy.spatial.distance.squareform(matrix)
Run Code Online (Sandbox Code Playgroud)
然后对于每一行,matrix[i]您可以在索引处找到最大值,例如matrix[i][j],您知道对于第i个点,其最近点是第j个点.
| 归档时间: |
|
| 查看次数: |
2486 次 |
| 最近记录: |