使用字符串和数字列读取txt文件

ele*_*aby 0 python string format numpy genfromtxt

我有制表符分隔文件(city-data.txt):

Alabama Montgomery  32.361538   -86.279118
Alaska  Juneau  58.301935   -134.41974
Run Code Online (Sandbox Code Playgroud)

有可能以某种方式读取前两列作为字符串,最后两列作为浮点数?

我的输出应该如下所示:

[(Alabama,Montgomery,32.36,-86.28),
 (Alaska,Juneau,58.30,-134.42)]
Run Code Online (Sandbox Code Playgroud)

我试过了:

mylist2=np.genfromtxt(r'city-data.txt', delimiter='\t',  dtype=("<S15","
<S15", float, float)).tolist()
Run Code Online (Sandbox Code Playgroud)

这给了我字节类型的前两列:

[(b'Alabama', b'Montgomery', 32.361538, -86.279118),
 (b'Alaska', b'Juneau', 58.301935, -134.41974)]
Run Code Online (Sandbox Code Playgroud)

我也尝试过:

with open('city-data.txt') as f:
mylist = [tuple(i.strip().split('\t')) for i in f]
Run Code Online (Sandbox Code Playgroud)

这给了我字符串类型的所有列:

[('Alabama', 'Montgomery', '32.361538', '-86.279118'),
 ('Alaska', 'Juneau', '58.301935', '-134.41974')]
Run Code Online (Sandbox Code Playgroud)

我无法想出如何实现我需要的东西......

pau*_*ult 5

您可以使用pandas read_csv将文件内容读入数据帧.然后将行转换为您使用指定的列表df.values.tolist().

例:

import pandas as pd

df = pd.read_csv(filename, sep="\t", header=None)

print(df.values.tolist())
#[['Alabama', 'Montgomery', 32.361538, -86.27911800000001],
# ['Alaska', 'Juneau', 58.301935, -134.41974]]
Run Code Online (Sandbox Code Playgroud)

如果你需要它们作为元组,只需使用map():

print(map(tuple, df.values.tolist()))
#[('Alabama', 'Montgomery', 32.361538, -86.27911800000001),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
Run Code Online (Sandbox Code Playgroud)

编辑

如果您想使用numpy,对现有代码的这种轻微修改应该有效.dtype将文本字段更改为"O":

mylist2=np.genfromtxt(filename delimiter='\t', dtype=("O","O", float, float)).tolist()
#[('Alabama', 'Montgomery', 32.361538, -86.279118),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
Run Code Online (Sandbox Code Playgroud)