dav*_*veb 1 python csv grouping distinct-values pandas
我想使用 Python Pandas从CSV文件中删除重复记录。CSV 文件包含具有三个属性的记录:scale、minzoom和maxzoom。我想要一个带有minzoom和maxzoom的结果数据框,并且留下的记录是唯一的。
IE,
输入 CSV 文件 (lookup_scales.csv)
Scale, minzoom, maxzoom
2000, 0, 15
3000, 0, 15
10000, 8, 15
20000, 8, 15
200000, 15, 18
250000, 15, 18
Run Code Online (Sandbox Code Playgroud)
必需的distinct_lookup_scales.csv(无比例列)
minzoom, maxzoom
0,5
8,15
15,18
Run Code Online (Sandbox Code Playgroud)
到目前为止我的代码是
lookup_scales_df = pd.read_csv('C:/Marine/lookup/lookup_scales.csv', names = ['minzoom','maxzoom'])
lookup_scales_df = lookup_scales_df.set_index([2, 3])
file_name = "C:/Marine/lookup/distinct_lookup_scales.csv"
lookup_scales_df.groupby('minzoom', 'maxzoom').to_csv(file_name, sep=',')
Run Code Online (Sandbox Code Playgroud)
你不需要 NumPy 或任何东西。在使用 Pandas 导入 CSV 文件时,您可以在一行中执行 unique-ify:
import pandas as pd
df = pd.read_csv('lookup_scales.csv', usecols=['minzoom', 'maxzoom']).drop_duplicates(keep='first').reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
import pandas as pd
df = pd.read_csv('lookup_scales.csv', usecols=['minzoom', 'maxzoom']).drop_duplicates(keep='first').reset_index()
Run Code Online (Sandbox Code Playgroud)
然后将其写入 CSV 文件:
df.to_csv(file_name, index=False) # You don't need to set sep in this because to_csv makes it comma-delimited.
Run Code Online (Sandbox Code Playgroud)
所以整个代码:
import pandas as pd
df = pd.read_csv('lookup_scales.csv', usecols=['minzoom', 'maxzoom']).drop_duplicates(keep='first').reset_index()
file_name = "C:/Marine/lookup/distinct_lookup_scales.csv"
df.to_csv(file_name, index=False) # You don't need to set sep in this, because to_csv makes it comma-delimited.
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
8628 次 |
最近记录: |