bad*_*0re 5 python text-processing information-retrieval data-mining tf-idf
我在互联网上找到以下代码来计算TFIDF:
https://github.com/timtrueman/tf-idf/blob/master/tf-idf.py
Run Code Online (Sandbox Code Playgroud)
我在函数def idf(word,documentList)中添加了"1+",所以我不会被0除错:
return math.log(len(documentList) / (1 + float(numDocsContaining(word,documentList))))
Run Code Online (Sandbox Code Playgroud)
但我对两件事感到困惑:
码:
documentNumber = 0
for word in documentList[documentNumber].split(None):
words[word] = tfidf(word,documentList[documentNumber],documentList)
Run Code Online (Sandbox Code Playgroud)
是否应仅在第一份文件上计算TFIDF?
Fre*_*Foo 11