小编Abh*_*bhi的帖子

无法将制表符分隔文件读入numpy 2-D数组

我是nympy的新手,我正在尝试使用以下代码将tab(\ t)分隔的文本文件读入numpy数组矩阵:

train_data = np.genfromtxt('training.txt', dtype=None, delimiter='\t')

Run Code Online (Sandbox Code Playgroud)

文件内容:

38   Private    215646   HS-grad    9    Divorced    Handlers-cleaners   Not-in-family   White   Male   0   0   40   United-States   <=50K
53   Private    234721   11th   7    Married-civ-spouse  Handlers-cleaners   Husband     Black   Male   0   0   40   United-States   <=50K
30   State-gov  141297   Bachelors  13   Married-civ-spouse  Prof-specialty  Husband     Asian-Pac-Islander  Male   0   0   40   India   >50K

Run Code Online (Sandbox Code Playgroud)

我期待的是形状的二维阵列矩阵(3,15)

但是我的上面的代码我只得到一个单行数组(3,)

我不确定为什么每行的15个字段都没有分配一列.

我也尝试使用numpy的loadtxt(),但是它无法处理我的数据的类型转换,即使我给了dtype = None,它试图将字符串转换为默认浮点类型并且失败了.

试过的代码:

train_data = np.loadtxt('try.txt', dtype=None, delimiter='\t')

Error:
ValueError: could not convert string to float: State-gov

Run Code Online (Sandbox Code Playgroud)

有什么指针吗？

谢谢

numpy python-2.7 genfromtxt

Abh*_*bhi

lucky-day

6
推荐指数

1
解决办法

1万
查看次数

使用scikit-learn处理太多分类功能

我对scikit-learn很新,我正在尝试使用这个包来预测收入数据.这可能是一个重复的问题,因为我看到了另一篇文章,但我正在寻找一个简单的例子来理解scikit-learn估算器的预期.

我拥有的数据具有以下结构,其中许多功能是分类的(例如:工作类,教育..)

age: continuous.
workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
sex: Female, Male.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
native-country: …

Run Code Online (Sandbox Code Playgroud)

python numpy scikit-learn

Abh*_*bhi

2013 10-09

3
推荐指数

1
解决办法

1414
查看次数