小编kil*_*zio的帖子

scikit-学习TfidfVectorizer忽略某些单词

我正在对来自葡萄牙历史的维基百科页面上的句子尝试TfidfVectorizer。但是我注意到该TfidfVec.fit_transform方法忽略了某些单词。这是我尝试过的句子：

sentence = "The oldest human fossil is the skull discovered in the Cave of Aroeira in Almonda."

TfidfVec = TfidfVectorizer()
tfidf = TfidfVec.fit_transform([sentence])

cols = [words[idx] for idx in tfidf.indices]
matrix = tfidf.todense()
pd.DataFrame(matrix,columns = cols,index=["Tf-Idf"])

Run Code Online (Sandbox Code Playgroud)

数据帧的输出：