相关疑难解决方法(0)

NumPy和SciPy - .todense()和.toarray()之间的区别

我想知道在稀疏的NumPy数组上使用.toarray()vs. 是否有任何差异(优点/缺点).todense().例如,

import scipy as sp
import numpy as np
sparse_m = sp.sparse.bsr_matrix(np.array([[1,0,0,0,1], [1,0,0,0,1]]))

%timeit sparse_m.toarray()
1000 loops, best of 3: 299 µs per loop

%timeit sparse_m.todense()
1000 loops, best of 3: 305 µs per loop
Run Code Online (Sandbox Code Playgroud)

python numpy scipy

24
推荐指数
1
解决办法
3万
查看次数

分类器中是否正确选择和使用了所有特征?

我想知道当我使用分类器时是否,例如:

random_forest_bow = Pipeline([
        ('rf_tfidf',Feat_Selection. countV),
        ('rf_clf',RandomForestClassifier(n_estimators=300,n_jobs=3))
        ])
    
random_forest_ngram.fit(DataPrep.train['Text'],DataPrep.train['Label'])
predicted_rf_ngram = random_forest_ngram.predict(DataPrep.test_news['Text'])
np.mean(predicted_rf_ngram == DataPrep.test_news['Label'])
Run Code Online (Sandbox Code Playgroud)

我也在考虑模型中的其他功能。我定义 X 和 y 如下:

X=df[['Text','is_it_capital?', 'is_it_upper?', 'contains_num?']]
y=df['Label']

X_train, X_test, y_train, y_test  = train_test_split(X, y, test_size=0.25, random_state=40) 

df_train= pd.concat([X_train, y_train], axis=1)
df_test = pd.concat([X_test, y_test], axis=1)

countV = CountVectorizer()
train_count = countV.fit_transform(df.train['Text'].values)
Run Code Online (Sandbox Code Playgroud)

我的数据集如下所示

Text                             is_it_capital?     is_it_upper?      contains_num?   Label
an example of text                      0                  0               0            0
ANOTHER example of text                 1                  1               0            1
What's happening?Let's talk at 5        1                  0               1            1
Run Code Online (Sandbox Code Playgroud)

我还想将 …

python machine-learning feature-selection scikit-learn

4
推荐指数
1
解决办法
155
查看次数