Raf*_*nez 6 python scikit-learn
我想将以下短语更改为sklearn向量:
Article 1. It is not good to eat pizza after midnight
Article 2. I wouldn't survive a day withouth stackexchange
Article 3. All of these are just random phrases
Article 4. To prove if my experiment works.
Article 5. The red dog jumps over the lazy fox
Run Code Online (Sandbox Code Playgroud)
我得到以下代码:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=1)
n=0
while n < 5:
n = n + 1
a = ('Article %(number)s' % {'number': n})
print(a)
with open("LISR2.txt") as openfile:
for line in openfile:
if a in line:
X=line
print(vectorizer.fit_transform(X))
Run Code Online (Sandbox Code Playgroud)
这给了我以下错误:
ValueError: Iterable over raw text documents expected, string object received.
Run Code Online (Sandbox Code Playgroud)
为什么会这样?我知道这应该有效,因为如果我单独输入:
X=("It is not good to eat pizza","I wouldn't survive a day", "All of these")
print(vectorizer.fit_transform(X))
Run Code Online (Sandbox Code Playgroud)
它给了我我想要的矢量.
(0, 8) 1
(0, 2) 1
(0, 11) 1
(0, 3) 1
(0, 6) 1
(0, 4) 1
(0, 5) 1
(1, 1) 1
(1, 9) 1
(1, 12) 1
(2, 10) 1
(2, 7) 1
(2, 0) 1
Run Code Online (Sandbox Code Playgroud)
小智 7
当您提供原始数据时出现此问题,意味着直接将字符串提供给提取函数,而您可以给Y = [X]并将此Y作为参数传递然后您将得到它正确我也面临这个问题