我有一个这样的数据集:
ISIN,"MIC","Datum","Open","Hoog","Laag","Close","Number of Shares","Number of Trades","Turnover","Valuta"
NL0011821202,"Euronext Amsterdam Brussels","04/09/2017","14.82","14.95","14.785","14.855","7482805","6970","111345512.83","EUR"
NL0011821202,"Euronext Amsterdam Brussels","05/09/2017","14.91","14.92","14.585","14.655","15240971","12549","224265257.14","EUR"
NL0011821202,"Euronext Amsterdam Brussels","07/09/2017","14.69","14.74","14.535","14.595","15544695","15817","227478163.74","EUR"
Run Code Online (Sandbox Code Playgroud)
但是我无法使用 pd.read_csv('filename.csv') 正确读取文件我尝试了各种组合,例如:
sep='"',
delimiter=","
Run Code Online (Sandbox Code Playgroud)
但根本没有运气!我希望第一行是要删除的列和引号字符和逗号。
我如何有效地解决这个问题?
问题是有时会出现 double ",解决方案是"在前后更改零个或多个匹配项的分隔符,:
df = pd.read_csv('ING_DAILY - ING_DAILY.csv', sep='["]*,["]*', engine='python')
Run Code Online (Sandbox Code Playgroud)
然后有必要"从列名和第一列和最后一列中删除:
df.columns = df.columns.str.strip('"')
df.iloc[:, [0,-1]] = df.iloc[:, [0,-1]].apply(lambda x: x.str.strip('"'))
print (df.head(3))
ISIN MIC Datum Open Hoog \
0 NL0011821202 Euronext Amsterdam Brussels 04/09/2017 14.82 14.950
1 NL0011821202 Euronext Amsterdam Brussels 05/09/2017 14.91 14.920
2 NL0011821202 Euronext Amsterdam Brussels 06/09/2017 14.69 14.725
Laag Close Number of Shares Number of Trades Turnover Valuta
0 14.785 14.855 7482805 6970 1.113455e+08 EUR
1 14.585 14.655 15240971 12549 2.242653e+08 EUR
2 14.570 14.615 14851426 15303 2.175316e+08 EUR
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4953 次 |
| 最近记录: |