Yao*_*ian 2 python split tf-idf scikit-learn
使用sklean tf-idf,默认使用空间分割
corpus = [
'This is the first document.',
'This is the second second document.',
'And the third one.',
'Is this the first document?'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
Run Code Online (Sandbox Code Playgroud)
但是,我想使用这种形式:
enter code herecorpus = [
'This####is####the####first####document.',
'This####is####the####second####second####document.'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
tfidf=transformer.fit_transform(vectorizer.fit_transform(documents))
word=vectorizer.get_feature_names()
weight=tfidf.toarray()
Run Code Online (Sandbox Code Playgroud)
怎么做?
使用自定义标记器:
def four_pounds_tokenizer(s):
return s.split('####')
vectorizer = CountVectorizer(tokenizer=four_pounds_tokenizer)
X = vectorizer.fit_transform(corpus)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1101 次 |
| 最近记录: |