使用Tensorflow的LinearClassifier和Panda的数据框构建SVM

Jon*_*han 3 python machine-learning svm pandas tensorflow

我知道这个问题,但这是一个过时的功能。

假设我正在尝试根据某人已经访问过的国家和他们的收入来预测一个人是否会访问“ X”国。

我在pandas DataFrame中有一个训练数据集,格式如下。

  1. 每行代表一个不同的人,每个人与矩阵中的其他人无关。
  2. 前10列都是国家/地区的名称,列中的值是二进制值(如果访问过该国家,则为1;否则,则为0)。
  3. 第11栏是他们的收入。这是一个连续的十进制变量。
  4. 最后,第12列是另一个二进制表,表示是,他们是否访问过“ X”。

因此,从本质上讲,如果我的数据集中有100,000个人,那么我的数据框的尺寸为100,000 x 12。我希望能够使用tensorflow将其正确传递到线性分类器中。但是,即使是如何处理,也不确定。

我正在尝试将数据传递给此功能

estimator = LinearClassifier(
    n_classes=n_classes, feature_columns=[sparse_column_a, 
 sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)
Run Code Online (Sandbox Code Playgroud)

(如果对使用哪种估算器有更好的建议,我愿意尝试。)

我将数据传递为:

df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)
Run Code Online (Sandbox Code Playgroud)

但是,我不确定如何获取此输出并将其正确传递到分类器中。我是否可以正确设置问题?我不是来自数据科学领域,因此任何指导都将非常有帮助!

顾虑

  1. 第11列是协变量。因此,我认为它不能仅作为功能部件传递,对吗?
  2. 由于第11列是与第1列到第10列完全不同的功能,因此我也如何将第11列并入分类器。
  3. 至少,即使我忽略第11列,如何至少将第1到10列与label =列12匹配,并将其传递给分类器?

(赏金所需的工作代码)

muj*_*iga 6

线性支持向量机

SVM是最大边距分类器,即,它使正分类与负分类的宽度或余量最大化。下面给出了二进制分类情况下线性SVM的损失函数。

在此处输入图片说明

它可以从下面显示的更广义的多类线性SVM损耗(也称为铰链损耗)中得出(α= 1)。

在此处输入图片说明 在此处输入图片说明

注意:在上述所有等式中,权重向量w包括偏差b

到底是怎么有人想出这种损失的呢?让我们深入。

在此处输入图片说明

上图显示了属于正类的数据点与属于负类的数据点之间通过一个分隔的超平面(显示为实线)分开。但是,可以有许多这样的分离超平面。SVM找到分离的超平面,以使超平面到最近的正数据点和到最近的负数据点的距离最大(如虚线所示)。

从数学上讲,SVM找到权重向量w(包括偏置)使得

在此处输入图片说明

如果标签(y+)的五个级的和-ve类是+1-1分别接SVM发现w使得

在此处输入图片说明

• If a data point is on the correct side of the hyperplane (correctly classified) then

在此处输入图片说明

• If a data point is on the wrong side (miss classified) then

在此处输入图片说明

So the loss for a data point, which is a measure of miss classification can be written as

在此处输入图片说明

Regularization

If a weight vector w correctly classifies the data (X) then any multiple of these weight vector ?w where ?>1 will also correctly classifies the data ( zero loss). This is because the transformation ?W stretches all score magnitudes and hence also their absolute differences. L2 regularization penalizes the large weights by adding the regularization loss to the hinge loss.

在此处输入图片说明

For example, if x=[1,1,1,1] and two weight vectors w1=[1,0,0,0], w2=[0.25,0.25,0.25,0.25]. Then dot(W1,x) =dot(w2,x) =1 i.e. both the weight vectors lead to the same dot product and hence same hinge loss. But the L2 penalty of w1 is 1.0 while the L2 penalty of w2 is only 0.25. Hence L2 regularization prefers w2 over w1. The classifier is encouraged to take into account all input dimensions to small amounts rather than a few input dimensions and very strongly. This improve the generalization of the model and lead to less overfitting.

L2 penalty leads to the max margin property in SVMs. If the SVM is expressed as an optimization problem then the generalized Lagrangian form for the constrained quadratic optimization problem is as below

在此处输入图片说明

Now that we know the loss function of linear SVM we can use gradient decent (or other optimizers) to find the weight vectors which minimizes the loss.

Code

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y = iris.target[iris.target != 2]

# Change labels to +1 and -1 
y = np.where(y==1, y, -1)

# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))

# Hinge loss
def hinge_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

# Train the model
model.compile(optimizer='adam', loss=hinge_loss)
model.fit(X, y,  epochs=50000, verbose=False)

# Plot the learned decision boundary 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明

SVM can also be expressed as a constrained quadratic optimization problem. The advantage of this formulation is that we can use the kernel trick to classify non linearly separable data (using different kernels). LIBSVM implements the Sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs).

Code

from sklearn.svm import SVC
# SVM with linear kernel
clf = SVC(kernel='linear')
clf.fit(X, y) 

# Plot the learned decision boundary 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show() 
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明

Finally

The Linear SVM model using tf which you can use for your problem statement is

# Prepare Data 
# 10 Binary features
df = pd.DataFrame(np.random.randint(0,2,size=(1000, 10)))
# 1 floating value feature 
df[11] = np.random.uniform(0,100000, size=(1000))
# True Label 
df[12] = pd.DataFrame(np.random.randint(0, 2, size=(1000)))

# Convert data to zero mean unit variance 
scalar = StandardScaler().fit(df[df.columns.drop(12)])
X = scalar.transform(df[df.columns.drop(12)])
y = np.array(df[12])

# convert label to +1 and -1. Needed for hinge loss
y = np.where(y==1, +1, -1)

# Model 
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', 
                                kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge Loss
def my_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

# Train model 
model.compile(optimizer='adam', loss=my_loss)
model.fit(X, y,  epochs=100, verbose=True)
Run Code Online (Sandbox Code Playgroud)

K-Fold cross validation and making predictions

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.metrics import roc_curve, auc

# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y_ = iris.target[iris.target != 2]

# Change labels to +1 and -1 
y = np.where(y_==1, +1, -1)


# Hinge loss
def hinge_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

def get_model():
    # Linear Model with L2 regularization
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
    model.compile(optimizer='adam', loss=hinge_loss)
    return model

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)


kf = KFold(n_splits=2, shuffle=True)

# K Fold cross validation
best = (None, -1)

for i, (train_index, test_index) in enumerate(kf.split(X)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model = get_model()
    model.fit(X_train, y_train, epochs=5000, verbose=False, batch_size=128)
    y_pred = model.predict_classes(X_test)
    val = roc_auc_score(y_test, y_pred)    
    print ("CV Fold {0}: AUC: {1}".format(i+1, auc))
    if best[1] < val:
        best = (model, val)

# ROC Curve using the best model
y_score = predict(best[0], X)
fpr, tpr, _ = roc_curve(y_, y_score)
roc_auc = auc(fpr, tpr)
print (roc_auc)

# Plot ROC
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()

# Make predictions
y_score = predict_class(best[0], X)
Run Code Online (Sandbox Code Playgroud)

Making predictions

由于模型的输出是线性的,因此我们必须将其归一化为预测的概率。如果是二进制分类,则可以使用sigmoid;如果它是多分类,则可以使用softmax。以下代码用于二进制分类

predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)
Run Code Online (Sandbox Code Playgroud)

参考文献

  1. CS231n
  2. 我的Kaggle笔记本