the*_*ldo 2 python dataframe pandas
我有以下 csv 文件,我正在使用 pandas dataframe 读取该文件:
Timestamp, UTC, id, loc, spd
001, 12z, q20, "52, 13", 320
002, 13z, a32, "53, 12", 321
003, 14z, q32, "54, 11", 321
004, 15`, a43, "55, 10", 330
Run Code Online (Sandbox Code Playgroud)
我提取数据如下:
import pandas as pd
import matplotlib.pyplot as plt
fname = "data.csv"
data = pd.read_csv(fname,sep=",", header=None, skiprows=1)
data.columns = ["Timestamp", "UTC", "Callsign", "Position", "Speed", "Direction"]
t = data["Timestamp"]
utc = data["UTC"]
acid = data["Callsign"]
pos = data["Position"]
spd = ["Speed"]
Run Code Online (Sandbox Code Playgroud)
但是,对于位置列,这会导致该列中每行有 2 个值。我希望将位置的第一个值放在单独的列表中,并将第二个值放在单独的列表中,如下所示:
latitude = [52,53,54,55]
longitude = [13,12,11,10]
Run Code Online (Sandbox Code Playgroud)
如何使用 pandas 数据框选择它?
如果需要 2 个新列,则使用Series.str.stripwith Series.str.split,然后转换为浮点数:
data[['lat','lon']] = (data["Position"].str.strip('"')
.str.split(',\s+', expand=True)
.astype(float))
print (data)
Timestamp UTC Callsign Position Speed lat lon
0 1 12z q20 "52, 13" 320 52.0 13.0
1 2 13z a32 "53, 12" 321 53.0 12.0
2 3 14z q32 "54, 11" 321 54.0 11.0
3 4 15` a43 "55, 10" 330 55.0 10.0
Run Code Online (Sandbox Code Playgroud)
如果需要 2 个列表:
lat, lon = (data["Position"].str.strip('"')
.str.split(',\s+', expand=True)
.astype(float)
.to_numpy()
.T.tolist())
print (lat, lon)
[52.0, 53.0, 54.0, 55.0] [13.0, 12.0, 11.0, 10.0]
Run Code Online (Sandbox Code Playgroud)