我有以下 file.txt(已删节):
SICcode Catcode Category SICname MultSIC
0111 A1500 Wheat, corn, soybeans and cash grain Wheat X
0112 A1600 Other commodities (incl rice, peanuts) Rice X
0115 A1500 Wheat, corn, soybeans and cash grain Corn X
0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X
0131 A1100 Cotton Cotton X
0132 A1300 Tobacco & Tobacco products Tobacco X
Run Code Online (Sandbox Code Playgroud)
我在将它读入熊猫 df 时遇到了一些问题。我尝试pd.read_csv了以下规范,engine='python', sep='Tab'但它在一列中返回了文件:
?SICcode Catcode Category SICname MultSIC
0 0111 A1500 Wheat, corn, soybeans...
1 0112 A1600 Other commodities (in...
2 0115 A1500 Wheat, corn, soybeans...
3 0116 A1500 Wheat, corn, soybeans...
Run Code Online (Sandbox Code Playgroud)
然后我尝试使用“tab”作为分隔符将它放入一个 gnumeric 文件中,但它将文件作为一列读取。有没有人对此有想法?
如果df = pd.read_csv('file.txt', sep='\t')返回一个包含一列的 DataFrame,那么显然file.txt没有使用制表符作为分隔符。您的数据可能只是将空格作为分隔符。在这种情况下,你可以尝试
df = pd.read_csv('data', sep=r'\s{2,}')
Run Code Online (Sandbox Code Playgroud)
它使用正则表达式模式\s{2,}作为分隔符。此正则表达式匹配 2 个或更多空白字符。
In [8]: df
Out[8]:
SICcode Catcode Category SICname \
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat
1 112 A1600 Other commodities (incl rice, peanuts) Rice
2 115 A1500 Wheat, corn, soybeans and cash grain Corn
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC
5 131 A1100 Cotton Cotton
6 132 A1300 Tobacco & Tobacco products Tobacco
MultSIC
0 X
1 X
2 X
3 X
4 X
5 X
6 X
Run Code Online (Sandbox Code Playgroud)
如果这不起作用,请发布print(repr(open(file.txt, 'rb').read(100))。这将向我们展示 . 的前 100 个字节的明确表示file.txt。
| 归档时间: |
|
| 查看次数: |
4598 次 |
| 最近记录: |