对于从文本中提取特征,如何检查向量化器(例如 TfIdfVectorizer 或 CountVectorizer)是否已经适合训练数据?
特别是,我希望代码能够自动确定矢量化器是否已经适合。
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
def vectorize_data(texts):
# if vectorizer has not been already fit
vectorizer.fit_transform(texts)
# else
vectorizer.transform(texts)
Run Code Online (Sandbox Code Playgroud)