我搜索过S/O但我找不到答案.
当我尝试用seaborn绘制分布图时,我得到了一个未来的警告.我想知道这里可能出现什么问题.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
% matplotlib inline
from sklearn import datasets
iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['class'] = iris.target
df['species'] = df['class'].map({idx:s for idx, s in enumerate(iris.target_names)})
fig, ((ax1,ax2),(ax3,ax4))= plt.subplots(2,2, figsize =(13,9))
sns.distplot(a = df.iloc[:,0], ax=ax1)
sns.distplot(a = df.iloc[:,1], ax=ax2)
sns.distplot(a = df.iloc[:,2], ax=ax3)
sns.distplot(a = df.iloc[:,3], ax=ax4)
plt.show()
Run Code Online (Sandbox Code Playgroud)
这是警告:
C:\ProgramData\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713:
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated;
use `arr[tuple(seq)]` instead …Run Code Online (Sandbox Code Playgroud) I tried to write a custom implementation of basic neural network with two hidden layers on MNIST dataset using *TensorFlow 2.0 beta* but I'm not sure what went wrong here but my training loss and accuracy seems to stuck at 1.5 and around 85 respectively. But If I build the using Keras I was getting very low training loss and accuracy above 95% with just 8-10 epochs.
I believe that maybe I'm not updating my weights or something? So do …
我正在尝试使用TensorFlow 2.0构建多类Logistic回归,并且我编写了我认为是正确的代码,但并没有给出很好的结果。我的准确度实际上是0.1%,甚至损失也没有减少。我希望有人可以在这里帮助我。
这是我到目前为止编写的代码。请指出我在这里做错了什么,我需要改进以使我的模型正常工作。谢谢!
from tensorflow.keras.datasets import fashion_mnist
from sklearn.model_selection import train_test_split
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train/255., x_test/255.
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.15)
x_train = tf.reshape(x_train, shape=(-1, 784))
x_test = tf.reshape(x_test, shape=(-1, 784))
weights = tf.Variable(tf.random.normal(shape=(784, 10), dtype=tf.float64))
biases = tf.Variable(tf.random.normal(shape=(10,), dtype=tf.float64))
def logistic_regression(x):
lr = tf.add(tf.matmul(x, weights), biases)
return tf.nn.sigmoid(lr)
def cross_entropy(y_true, y_pred):
y_true = tf.one_hot(y_true, 10)
loss = tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)
return tf.reduce_mean(loss)
def accuracy(y_true, y_pred):
y_true = …Run Code Online (Sandbox Code Playgroud) I'm trying to build a lstm model for text classification and I'm receiving an error. This is my entire code that I've tried.
Please let me know what's the reason behind the error and how to fix it.
input1.shape # text data integer coded
(37788, 130)
input2.shape # multiple category columns(one hot encoded) concatenated together
(37788, 104)
train_data = [input1, input2] # this is the train data.
i1 = Input(shape=(130,), name='input')
embeddings = Embedding(input_dim=20000, output_dim=100, input_length=130)(i1)
lstm = LSTM(100)(embeddings)
flatten …Run Code Online (Sandbox Code Playgroud) 我一直在处理一些文本数据,并且有一些稀疏矩阵和密集矩阵(numpy 数组)。我只是想知道如何正确地组合它们。
这些是数组的类型和形状:
list1
<109248x9 sparse matrix of type '<class 'numpy.int64'>'
with 152643 stored elements in Compressed Sparse Row format>
list2
<109248x3141 sparse matrix of type '<class 'numpy.int64'>'
with 350145 stored elements in Compressed Sparse Row format>
list3.shape , type(list3)
(109248, 300) , numpy.ndarray
list4.shape , type
(109248, 51) , numpy.ndarray
Run Code Online (Sandbox Code Playgroud)
我只想将它们全部组合在一起作为一个密集矩阵。我尝试了一些 vstack 和 hstack 但无法弄清楚。任何帮助深表感谢。
Output required: (109248, 3501)
Run Code Online (Sandbox Code Playgroud) 我正在使用 KNN 构建两类分类模型
我尝试计算 auc_score
from sklearn.metrics import auc
auc(y_test, y_pred)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-183-980dc3c4e3d7> in <module>
----> 1 auc(y_test, y_pred)
~/.local/lib/python3.6/site-packages/sklearn/metrics/ranking.py in auc(x, y, reorder)
117 else:
118 raise ValueError("x is neither increasing nor decreasing "
--> 119 ": {}.".format(x))
120
121 area = direction * np.trapz(y, x)
ValueError: x is neither increasing nor decreasing : [1 1 1 ... 1 1 1].
Run Code Online (Sandbox Code Playgroud)
然后我用了roc_auc_score
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test, y_pred)
0.5118361429056588
Run Code Online (Sandbox Code Playgroud)
为什么它在工作的auc地方不起作用 …
我正在努力寻找最佳K价值KNeighborsClassifier。
这是我的数据集代码iris:
k_loop = np.arange(1,30)
k_scores = []
for k in k_loop:
knn = KNeighborsClassifier(n_neighbors=k)
cross_val = cross_val_score(knn, X, y, cv=10 , scoring='accuracy')
k_scores.append(cross_val.mean())
Run Code Online (Sandbox Code Playgroud)
我在每个循环中取了 cross_val_score 的平均值并绘制了它。
plt.style.use('fivethirtyeight')
plt.plot(k_loop, k_scores)
plt.show()
Run Code Online (Sandbox Code Playgroud)
这就是结果。
k您可以看到,当介于 到14之间时,准确度更高20。
1)如何选择k的最佳值。
2)还有其他方法来计算和找到最佳值吗K?
3)任何其他改进建议也将受到赞赏。我是新来的ML
我正在SGDClassifier与loss function = "hinge". 但是铰链损失不支持类标签的概率估计。
我需要计算概率roc_curve。如何在不使用 svm 的 SVC 的情况下获得 SGDClassifier 中铰链损失的概率?
我见过有人提到使用CalibratedClassifierCV来获取概率,但我从未使用过它,也不知道它是如何工作的。
我真的很感谢你的帮助。谢谢
如何检查文本列 pandas 中存在的停用词数量。我有一个巨大的数据集,因此非常感谢有效的方法。
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
print(df)
text
0 stackoverflow is good
1 stackoverflow is not good
Run Code Online (Sandbox Code Playgroud)
这是我想要的输出吗?
print(df)
text number_of_stopwords
0 stackoverflow is good 1
1 stackoverflow is not good 2
Run Code Online (Sandbox Code Playgroud)
我尝试过类似下面的方法,但没有成功。
df.str.split().apply(lambda x: len(x in stop_words))
Run Code Online (Sandbox Code Playgroud) 是否可以在 gridsearchcv 中使用 log_loss 指标?
我见过很少有人提到的帖子neg_log_loss?与 一样吗log_loss?如果没有可以直接在gridsearchcv中使用log_loss吗?
python ×10
python-3.x ×8
scikit-learn ×3
tensorflow ×3
pandas ×2
scipy ×2
auc ×1
grid-search ×1
keras ×1
knn ×1
numpy ×1
seaborn ×1