将数据文件随机分组并分成训练和测试集

Moh*_*OUI 5 python numpy pandas

我正在尝试将数据文件混乱并使用pandas和numpy将数据文件拆分为训练集和测试集,因此我执行了以下操作:

import pandas as pd
import numpy as np 

data_path = "/path_to_data_file/"

train = pd.read_csv(data_path+"product.txt", header=0, delimiter="|")
ts =  train.shape 
#print "data dimension", ts
#print "product attributes \n", train.columns.values 


#shuffle data set, and split to train and test set. 
df = pd.DataFrame(train)
new_train = df.reindex(np.random.permutation(df.index))

indice_90_percent = int((ts[0]/100.0)* 90)

print "90% indice", indice_90_percent

#write train products to csv 
#new_train.to_csv(sep="|")

with open('train_products.txt', 'w') as f:
    for i in new_train[:indice_90_percent]:
        f.write(i+'\n')


with open('test_products.txt', 'w') as f:
    for i in new_train[indice_90_percent:]:
        f.write(i+'\n')
Run Code Online (Sandbox Code Playgroud)

但是,我没有获得包含数据行的训练和测试文件,而是获得了两个包含列名称的文件.我错过了什么?

Pad*_*ham 4

如果您不希望列名称使用 ,则可以使用to_csvheader=False写入行。

new_train[indice_90_percent:].to_csv('test_products.txt',header=False)
new_train[:indice_90_percent].to_csv('train_products.txt',header=False)
Run Code Online (Sandbox Code Playgroud)