相关疑难解决方法(0)

Python TfidfVectorizer throw:空词汇; 也许文件只包含停用词"

我正在尝试使用Python的Tfidf来转换文本语料库.但是,当我尝试fit_transform它时,我得到一个值错误ValueError:空词汇; 也许这些文件只包含停用词.

In [69]: TfidfVectorizer().fit_transform(smallcorp)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-69-ac16344f3129> in <module>()
----> 1 TfidfVectorizer().fit_transform(smallcorp)

/Users/maxsong/anaconda/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in fit_transform(self, raw_documents, y)
   1217         vectors : array, [n_samples, n_features]
   1218         """
-> 1219         X = super(TfidfVectorizer, self).fit_transform(raw_documents)
   1220         self._tfidf.fit(X)
   1221         # X is already a transformed view of raw_documents so

/Users/maxsong/anaconda/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in fit_transform(self, raw_documents, y)
    778         max_features = self.max_features
    779 
--> 780         vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
    781         X = X.tocsc()
    782 

/Users/maxsong/anaconda/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in _count_vocab(self, raw_documents, fixed_vocab)
    725             vocabulary = …
Run Code Online (Sandbox Code Playgroud)

python tf-idf pandas scikit-learn

12
推荐指数
1
解决办法
2万
查看次数

例外的"恢复下一步"的Pythonic方式?

问题:我正在阅读一系列异构输入文件.我为每个人编写了一个阅读器类,使用它来读取文件__init__(self, file_name),并在输入格式错误时抛出异常.

代码如下所示:

clients              = Clients             ('Clients.csv'             )
simulation           = Simulation          ('Simulation.csv'          )
indicators           = Indicators          ('Indicators.csv'          )
legalEntity          = LegalEntity         ('LegalEntity.csv'         )
defaultPortfolio     = DefaultPortfolio    ('DefaultPortfolio.csv'    )
excludedProductTypes = ExcludedProductTypes('ExcludedProductTypes.csv')
Run Code Online (Sandbox Code Playgroud)

问题是我不想死在第一个格式错误的文件,而是阅读所有这些文件然后如果至少有一个文件格式错误则会死亡.我能找到的唯一方法看起来很可怕:

my errors = []    

try:
    clients              = Clients             ('Clients.csv'             )
except Exception, e:
    errors.append(e)
try:
    simulation           = Simulation          ('Simulation.csv'          )
except Exception, e:
    errors.append(e)
try:
    indicators           = Indicators          ('Indicators.csv'          )
except Exception, e:
    errors.append(e)
try:
    legalEntity          = LegalEntity         ('LegalEntity.csv'         )
except Exception, e:
    errors.append(e)
try:
    defaultPortfolio …
Run Code Online (Sandbox Code Playgroud)

python exception-handling exception

8
推荐指数
2
解决办法
2764
查看次数

如果发现异常,try except 是否立即在 python 中停止?

如果我的 python 代码中有一个 try except 块,并且我的 try 语句的第一行引发异常,它会自动转到异常还是先完成 try 块?

try:
    int(string)
    string = "This was a mistake, can't int string"
except:
    pass
Run Code Online (Sandbox Code Playgroud)

这是它检查它是否可以 int(string),它不能,然后它立即移动到除了,还是先进行字符串分配?

当我运行它时,它似乎立即停止,但我想知道这是肯定发生的还是其他原因。

谢谢

python

-1
推荐指数
1
解决办法
246
查看次数