我正在尝试使用Python的Tfidf来转换文本语料库.但是,当我尝试fit_transform它时,我得到一个值错误ValueError:空词汇; 也许这些文件只包含停用词.
In [69]: TfidfVectorizer().fit_transform(smallcorp)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-69-ac16344f3129> in <module>()
----> 1 TfidfVectorizer().fit_transform(smallcorp)
/Users/maxsong/anaconda/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in fit_transform(self, raw_documents, y)
1217 vectors : array, [n_samples, n_features]
1218 """
-> 1219 X = super(TfidfVectorizer, self).fit_transform(raw_documents)
1220 self._tfidf.fit(X)
1221 # X is already a transformed view of raw_documents so
/Users/maxsong/anaconda/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in fit_transform(self, raw_documents, y)
778 max_features = self.max_features
779
--> 780 vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
781 X = X.tocsc()
782
/Users/maxsong/anaconda/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in _count_vocab(self, raw_documents, fixed_vocab)
725 vocabulary = …Run Code Online (Sandbox Code Playgroud) 问题:我正在阅读一系列异构输入文件.我为每个人编写了一个阅读器类,使用它来读取文件__init__(self, file_name),并在输入格式错误时抛出异常.
代码如下所示:
clients = Clients ('Clients.csv' )
simulation = Simulation ('Simulation.csv' )
indicators = Indicators ('Indicators.csv' )
legalEntity = LegalEntity ('LegalEntity.csv' )
defaultPortfolio = DefaultPortfolio ('DefaultPortfolio.csv' )
excludedProductTypes = ExcludedProductTypes('ExcludedProductTypes.csv')
Run Code Online (Sandbox Code Playgroud)
问题是我不想死在第一个格式错误的文件,而是阅读所有这些文件然后如果至少有一个文件格式错误则会死亡.我能找到的唯一方法看起来很可怕:
my errors = []
try:
clients = Clients ('Clients.csv' )
except Exception, e:
errors.append(e)
try:
simulation = Simulation ('Simulation.csv' )
except Exception, e:
errors.append(e)
try:
indicators = Indicators ('Indicators.csv' )
except Exception, e:
errors.append(e)
try:
legalEntity = LegalEntity ('LegalEntity.csv' )
except Exception, e:
errors.append(e)
try:
defaultPortfolio …Run Code Online (Sandbox Code Playgroud) 如果我的 python 代码中有一个 try except 块,并且我的 try 语句的第一行引发异常,它会自动转到异常还是先完成 try 块?
try:
int(string)
string = "This was a mistake, can't int string"
except:
pass
Run Code Online (Sandbox Code Playgroud)
这是它检查它是否可以 int(string),它不能,然后它立即移动到除了,还是先进行字符串分配?
当我运行它时,它似乎立即停止,但我想知道这是肯定发生的还是其他原因。
谢谢