我有一个CSV文件,其中包含GIS模型中质心之间的距离,格式如下:
InputID,TargetID,Distance
1,2,3050.01327866
1,7,3334.99565217
1,5,3390.99115304
1,3,3613.77046864
1,4,4182.29900892
...
...
3330,3322,955927.582933
Run Code Online (Sandbox Code Playgroud)
它在origin(InputID
)上排序,然后在最近的目标(TargetID
)上排序.
对于特定的建模工具,我需要CSV文件中的这些数据,格式如下(数字是质心数字):
distance1->1, distance1->2, distance1->3,.....distance1->3330
distance2->1, distance2->2,.....
.....
distance3330->1,distance3330->2....distance3330->3330
Run Code Online (Sandbox Code Playgroud)
所以没有InputID或TargetID,只有行上的起源和列上的目的地的距离:(前5个起点/目的地的例子)
0,3050.01327866,3613.77046864,4182.29900892,3390.99115304
3050.01327866,0,1326.94611797,1175.10254872,1814.45584129
3613.77046864,1326.94611797,0,1832.209595,3132.78725738
4182.29900892,1175.10254872,1832.209595,0,1935.55056767
3390.99115304,1814.45584129,3132.78725738,1935.55056767,0
Run Code Online (Sandbox Code Playgroud)
我已经构建了下一个代码,它可以工作.但是运行它需要几天才能获得3330x3330文件.由于我是Python的初学者,我觉得我忽略了一些东西......
import pandas as pd
import numpy as np
file=pd.read_csv('c:\\users\\Niels\\Dropbox\\Python\\centroid_distances.csv')
df=file.sort_index(by=['InputID', 'TargetID'], ascending=[True, True])
number_of_zones=3330
text_file = open("c:\\users\\Niels\\Dropbox\\Python\\Output.csv", "w")
for origin in range(1,number_of_zones):
output_string=''
print(origin)
for destination in range(1,number_of_zones):
if origin==destination:
distance=0
else:
distance_row=df[(df['InputID']==origin) & (df['TargetID'] == destination)]
# I guess this is the time-consuming part
distance=distance_row.iloc[0]['Distance']
output_string=output_string+str(distance)+','
text_file.write(output_string[:-1]+'\n') #strip last …
Run Code Online (Sandbox Code Playgroud)