我的任务是将数据从 excel 读取到数据框。数据有点乱,为了清理我已经做了:
df_1 = pd.read_excel(offers[0])
df_1 = df_1.rename(columns={'???????????? [???? ?????: 29.05.2019 ?????: 10:29:42 ]':'good_name',
'????????':'barcode',
'???? ??. ???.':'price',
'???????': 'balance'
})
df_1 = df_1[new_columns]
# I don't know why but without replacing NaN with another char code doesn't work
df_1.barcode = df_1.barcode.fillna('_')
# remove all non-numeric characters
df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row))
# convert str to numeric
df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer').fillna(0)
df_1.head()
Run Code Online (Sandbox Code Playgroud)
它返回类型为 float64 的列条形码(为什么会这样?)
0 0.000000e+00
1 7.613037e+12
2 7.613037e+12
3 7.613034e+12
4 7.613035e+12
Name: barcode, …Run Code Online (Sandbox Code Playgroud)