shi*_*mar 2 python machine-learning python-3.x scikit-learn
我试图在标签数据上实现 CountVectorizer 但每次它都会抛出属性错误,尝试了所有方法但仍然无法理解为什么会出现此错误。这是我的代码,
vectorizer = CountVectorizer(tokenizer = lambda x: x.split(" "))
tag_dtm = vectorizer.fit_transform(tag_data['Tags'])
Run Code Online (Sandbox Code Playgroud)
这是我得到的错误:
`AttributeError
Traceback (most recent call last)
<ipython-input-53-7a05ab3b6655> in <module>()
7 # and learns the vocabulary; second, it transforms our training data
8 # into feature vectors. The input to fit_transform should be a list of strings.
----> 9 tag_dtm = vectorizer.fit_transform(tag_data['Tags'])
3 frames
/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/text.py in _preprocess(doc, accent_function, lower)
66 """
67 if lower:
---> 68 doc = doc.lower()
69 if accent_function is not None:
70 doc = accent_function(doc)
AttributeError: 'NoneType' object has no attribute 'lower'`
Run Code Online (Sandbox Code Playgroud)
您可以通过以下语法完成代码,并通过列表理解消除任何值是否为 null。
tag_dtm = vectorizer.fit_transform([str(val) for val in tag_data['Tags'] if val is not np.nan])
Run Code Online (Sandbox Code Playgroud)
请告诉我这是否适合您!