我有一个csv,struct is
CAT1,CAT2,TITLE,URL,CONTENT,CAT1,CAT2,TITLE,CONTENT都是中文的.
我想要火车LinearSVC或MultinomialNBX(TITLE)和功能(CAT1,CAT2),都会得到这个错误.下面是我的代码:
PS:我通过这个例子scikit-learn text_analytics在下面写代码
import numpy as np
import csv
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
label_list = []
def label_map_target(label):
''' map chinese feature name to integer '''
try:
idx = label_list.index(label)
except ValueError:
idx = len(label_list)
label_list.append(label)
return idx
c1_list = []
c2_list = []
title_list = []
with open(csv_file, 'r') as f:
# row_from_csv is for shorting this example
for row in …Run Code Online (Sandbox Code Playgroud)