我是数据框中的一个位置,在 lat lon 列名称下方。我想在单独的数据框中显示离最近火车站的纬度有多远。
例如,我有一个经纬度(37.814563 144.970267),我有一个其他地理空间点的列表如下。我想找到最近的点,然后找到这些点之间的距离,作为郊区数据框中的额外列。
这是火车数据集的示例
<bound method NDFrame.to_clipboard of STOP_ID STOP_NAME LATITUDE \
0 19970 Royal Park Railway Station (Parkville) -37.781193
1 19971 Flemington Bridge Railway Station (North Melbo... -37.788140
2 19972 Macaulay Railway Station (North Melbourne) -37.794267
3 19973 North Melbourne Railway Station (West Melbourne) -37.807419
4 19974 Clifton Hill Railway Station (Clifton Hill) -37.788657
LONGITUDE TICKETZONE ROUTEUSSP \
0 144.952301 1 Upfield
1 144.939323 1 Upfield
2 144.936166 1 Upfield
3 144.942570 1 Flemington,Sunbury,Upfield,Werribee,Williamsto...
4 144.995417 1 Mernda,Hurstbridge
geometry
0 POINT (144.95230 -37.78119)
1 POINT (144.93932 -37.78814)
2 POINT (144.93617 -37.79427)
3 POINT (144.94257 -37.80742)
4 POINT (144.99542 -37.78866) >
Run Code Online (Sandbox Code Playgroud)
这是郊区的一个例子
<bound method NDFrame.to_clipboard of postcode suburb state lat lon
4901 3000 MELBOURNE VIC -37.814563 144.970267
4902 3002 EAST MELBOURNE VIC -37.816640 144.987811
4903 3003 WEST MELBOURNE VIC -37.806255 144.941123
4904 3005 WORLD TRADE CENTRE VIC -37.822262 144.954856
4905 3006 SOUTHBANK VIC -37.823258 144.965926>
Run Code Online (Sandbox Code Playgroud)
我想展示的是,在郊区列表的新列中,从纬度到壁橱火车站的距离。
显示了两种解决方案,
from sklearn.neighbors import NearestNeighbors
from haversine import haversine
NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(trains_shape[['LATITUDE', 'LONGITUDE']])
indices = NN.kneighbors(df_complete[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(df_complete[['lat', 'lon']])[0]
df_complete['closest_station'] = trains_shape.iloc[indices]['STOP_NAME'].reset_index(drop=True)
df_complete['closest_station_distances'] = distances
print(df_complete)
Run Code Online (Sandbox Code Playgroud)
这里的输出,
<bound method NDFrame.to_clipboard of postcode suburb state lat lon Venues Cluster \
1 3040 aberfeldie VIC -37.756690 144.896259 4.0
2 3042 airport west VIC -37.711698 144.887037 1.0
4 3206 albert park VIC -37.840705 144.955710 0.0
5 3020 albion VIC -37.775954 144.819395 2.0
6 3078 alphington VIC -37.780767 145.031160 4.0
#1 #2 #3 \
1 Café Electronics Store Grocery Store
2 Fast Food Restaurant Café Supermarket
4 Café Pub Coffee Shop
5 Café Fast Food Restaurant Grocery Store
6 Café Park Bar
#4 ... #6 \
1 Coffee Shop ... Bakery
2 Grocery Store ... Italian Restaurant
4 Breakfast Spot ... Burger Joint
5 Vietnamese Restaurant ... Pub
6 Pizza Place ... Vegetarian / Vegan Restaurant
#7 #8 #9 \
1 Shopping Mall Japanese Restaurant Indian Restaurant
2 Portuguese Restaurant Electronics Store Middle Eastern Restaurant
4 Bar Bakery Gastropub
5 Chinese Restaurant Gym Bakery
6 Italian Restaurant Gastropub Bakery
#10 Ancestry Cluster ClosestStopId \
1 Greek Restaurant 8.0 20037
2 Convenience Store 5.0 20032
4 Beach 6.0 22180
5 Convenience Store 5.0 20004
6 Coffee Shop 5.0 19931
ClosestStopName \
1 Essendon Railway Station (Essendon)
2 Glenroy Railway Station (Glenroy)
4 Southern Cross Railway Station (Melbourne City)
5 Albion Railway Station (Sunshine North)
6 Alphington Railway Station (Alphington)
closest_station closest_station_distances
1 Glenroy Railway Station (Glenroy) 0.019918
2 Southern Cross Railway Station (Melbourne City) 0.031020
4 Alphington Railway Station (Alphington) 0.023165
5 Altona Railway Station (Altona) 0.005559
6 Newport Railway Station (Newport) 0.002375
Run Code Online (Sandbox Code Playgroud)
还有第二个功能。
def ClosestStop(r):
# Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
# Stop with minimum Distance from the Suburb
closestStationId = distances[distances == distances.min()].index.to_list()[0]
return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]
df_complete[['ClosestStopId', 'ClosestStopName']] = df_complete.apply(ClosestStop, axis=1)
Run Code Online (Sandbox Code Playgroud)
奇怪的是,这给出了不同的答案,并使我认为此代码存在问题。KM 似乎也错了。
完全不确定如何解决这个问题 - 希望得到一些指导,谢谢!
几个关键概念
foo=1
)sort_values()
查找最小距离groupby()
并agg()
获得最短距离的第一个值有两个数据框可供使用
dfdist
包含所有组合和距离dfnearest
其中包含结果dfstat = pd.DataFrame({'STOP_ID': ['19970', '19971', '19972', '19973', '19974'],
'STOP_NAME': ['Royal Park Railway Station (Parkville)',
'Flemington Bridge Railway Station (North Melbo...',
'Macaulay Railway Station (North Melbourne)',
'North Melbourne Railway Station (West Melbourne)',
'Clifton Hill Railway Station (Clifton Hill)'],
'LATITUDE': ['-37.781193',
'-37.788140',
'-37.794267',
'-37.807419',
'-37.788657'],
'LONGITUDE': ['144.952301',
'144.939323',
'144.936166',
'144.942570',
'144.995417'],
'TICKETZONE': ['1', '1', '1', '1', '1'],
'ROUTEUSSP': ['Upfield',
'Upfield',
'Upfield',
'Flemington,Sunbury,Upfield,Werribee,Williamsto...',
'Mernda,Hurstbridge'],
'geometry': ['POINT (144.95230 -37.78119)',
'POINT (144.93932 -37.78814)',
'POINT (144.93617 -37.79427)',
'POINT (144.94257 -37.80742)',
'POINT (144.99542 -37.78866)']})
dfsub = pd.DataFrame({'id': ['4901', '4902', '4903', '4904', '4905'],
'postcode': ['3000', '3002', '3003', '3005', '3006'],
'suburb': ['MELBOURNE',
'EAST MELBOURNE',
'WEST MELBOURNE',
'WORLD TRADE CENTRE',
'SOUTHBANK'],
'state': ['VIC', 'VIC', 'VIC', 'VIC', 'VIC'],
'lat': ['-37.814563', '-37.816640', '-37.806255', '-37.822262', '-37.823258'],
'lon': ['144.970267', '144.987811', '144.941123', '144.954856', '144.965926']})
import geopy.distance
# cartesian product so we get all combinations
dfdist = (dfsub.assign(foo=1).merge(dfstat.assign(foo=1), on="foo")
# calc distance in km between each suburb and each train station
.assign(km=lambda dfa: dfa.apply(lambda r:
geopy.distance.geodesic(
(r["LATITUDE"],r["LONGITUDE"]),
(r["lat"],r["lon"])).km, axis=1))
# reduce number of columns to make it more digestable
.loc[:,["postcode","suburb","STOP_ID","STOP_NAME","km"]]
# sort so shortest distance station from a suburb is first
.sort_values(["postcode","suburb","km"])
# good practice
.reset_index(drop=True)
)
# finally pick out stations nearest to suburb
# this can easily be joined back to source data frames as postcode and STOP_ID have been maintained
dfnearest = dfdist.groupby(["postcode","suburb"])\
.agg({"STOP_ID":"first","STOP_NAME":"first","km":"first"}).reset_index()
print(dfnearest.to_string(index=False))
dfnearest
Run Code Online (Sandbox Code Playgroud)
输出
postcode suburb STOP_ID STOP_NAME km
3000 MELBOURNE 19973 North Melbourne Railway Station (West Melbourne) 2.564586
3002 EAST MELBOURNE 19974 Clifton Hill Railway Station (Clifton Hill) 3.177320
3003 WEST MELBOURNE 19973 North Melbourne Railway Station (West Melbourne) 0.181463
3005 WORLD TRADE CENTRE 19973 North Melbourne Railway Station (West Melbourne) 1.970909
3006 SOUTHBANK 19973 North Melbourne Railway Station (West Melbourne) 2.705553
Run Code Online (Sandbox Code Playgroud)
# pick nearer places, based on lon/lat then all combinations
dfdist = (dfsub.assign(foo=1, latr=dfsub["lat"].round(1), lonr=dfsub["lon"].round(1))
.merge(dfstat.assign(foo=1, latr=dfstat["LATITUDE"].round(1), lonr=dfstat["LONGITUDE"].round(1)),
on=["foo","latr","lonr"])
# calc distance in km between each suburb and each train station
.assign(km=lambda dfa: dfa.apply(lambda r:
geopy.distance.geodesic(
(r["LATITUDE"],r["LONGITUDE"]),
(r["lat"],r["lon"])).km, axis=1))
# reduce number of columns to make it more digestable
.loc[:,["postcode","suburb","STOP_ID","STOP_NAME","km"]]
# sort so shortest distance station from a suburb is first
.sort_values(["postcode","suburb","km"])
# good practice
.reset_index(drop=True)
)
Run Code Online (Sandbox Code Playgroud)
您可以使用具有半正弦距离的sklearn.neighbors.NearestNeighbors。
import pandas as pd
dfstat = pd.DataFrame({'STOP_ID': ['19970', '19971', '19972', '19973', '19974'],
'STOP_NAME': ['Royal Park Railway Station (Parkville)', 'Flemington Bridge Railway Station (North Melbo...', 'Macaulay Railway Station (North Melbourne)', 'North Melbourne Railway Station (West Melbourne)', 'Clifton Hill Railway Station (Clifton Hill)'],
'LATITUDE': ['-37.781193', '-37.788140', '-37.794267', '-37.807419', '-37.788657'],
'LONGITUDE': ['144.952301', '144.939323', '144.936166', '144.942570', '144.995417'],
'TICKETZONE': ['1', '1', '1', '1', '1'],
'ROUTEUSSP': ['Upfield', 'Upfield', 'Upfield', 'Flemington,Sunbury,Upfield,Werribee,Williamsto...', 'Mernda,Hurstbridge'],
'geometry': ['POINT (144.95230 -37.78119)', 'POINT (144.93932 -37.78814)', 'POINT (144.93617 -37.79427)', 'POINT (144.94257 -37.80742)', 'POINT (144.99542 -37.78866)']})
dfsub = pd.DataFrame({'id': ['4901', '4902', '4903', '4904', '4905'],
'postcode': ['3000', '3002', '3003', '3005', '3006'],
'suburb': ['MELBOURNE', 'EAST MELBOURNE', 'WEST MELBOURNE', 'WORLD TRADE CENTRE', 'SOUTHBANK'],
'state': ['VIC', 'VIC', 'VIC', 'VIC', 'VIC'],
'lat': ['-37.814563', '-37.816640', '-37.806255', '-37.822262', '-37.823258'],
'lon': ['144.970267', '144.987811', '144.941123', '144.954856', '144.965926']})
Run Code Online (Sandbox Code Playgroud)
让我们首先在数据框中找到最接近某个随机点的点,例如-37.814563, 144.970267
。
NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(dfstat[['LATITUDE', 'LONGITUDE']])
NN.kneighbors([[-37.814563, 144.970267]])
Run Code Online (Sandbox Code Playgroud)
输出是(array([[2.55952637]]), array([[3]]))
数据帧中最近点的距离和索引。sklearn 中的半正弦距离在radius 中。如果你想以公里为单位计算,你可以使用hasrsine。
from haversine import haversine
NN = NearestNeighbors(n_neighbors=1, metric=haversine)
NN.fit(dfstat[['LATITUDE', 'LONGITUDE']])
NN.kneighbors([[-37.814563, 144.970267]])
Run Code Online (Sandbox Code Playgroud)
输出(array([[2.55952637]]), array([[3]]))
以公里为单位的距离。
现在,您可以应用到数据框中的所有点,并使用索引获取最近的站点。
indices = NN.kneighbors(dfsub[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(dfsub[['lat', 'lon']])[0]
dfsub['closest_station'] = dfstat.iloc[indices]['STOP_NAME'].reset_index(drop=True)
dfsub['closest_station_distances'] = distances
print(dfsub)
id postcode suburb state lat lon closest_station closest_station_distances
0 4901 3000 MELBOURNE VIC -37.814563 144.970267 North Melbourne Railway Station (West Melbourne) 2.559526
1 4902 3002 EAST MELBOURNE VIC -37.816640 144.987811 Clifton Hill Railway Station (Clifton Hill) 3.182521
2 4903 3003 WEST MELBOURNE VIC -37.806255 144.941123 North Melbourne Railway Station (West Melbourne) 0.181419
3 4904 3005 WORLD TRADE CENTRE VIC -37.822262 144.954856 North Melbourne Railway Station (West Melbourne) 1.972010
4 4905 3006 SOUTHBANK VIC -37.823258 144.965926 North Melbourne Railway Station (West Melbourne) 2.703926
Run Code Online (Sandbox Code Playgroud)
小智 5
尝试这个
import pandas as pd
def ClosestStop(r):
# Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
# Stop with minimum Distance from the Suburb
closestStationId = distances[distances == distances.min()].index.to_list()[0]
return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]
StationDf = pd.read_excel("StationData.xlsx")
SuburbDf = pd.read_excel("SuburbData.xlsx")
SuburbDf[['ClosestStopId', 'ClosestStopName']] = SuburbDf.apply(ClosestStop, axis=1)
print(SuburbDf)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1632 次 |
最近记录: |