blu*_*sky 7 python numpy linear-algebra neural-network tensorflow
这是我从深度学习课程中自定义扩展的一个Andrew NG的神经网络,而不是为二进制分类生成0或1我试图对多个例子进行分类.
输入和输出都是热编码的.
经过不多的培训,我得到了准确性 'train accuracy: 67.51658067499625 %'
如何对单个训练样例进行分类,而不是对所有训练样例进行分类?
我认为我的实现中存在一个错误,因为该网络的问题是训练示例(train_set_x)和输出值(train_set_y)都需要具有相同的维度或者接收到与矩阵的维度相关的错误.例如使用:
train_set_x = np.array([
[1,1,1,1],[0,1,1,1],[0,0,1,1]
])
train_set_y = np.array([
[1,1,1],[1,1,0],[1,1,1]
])
Run Code Online (Sandbox Code Playgroud)
返回错误:
ValueError Traceback (most recent call last)
<ipython-input-11-0d356e8d66f3> in <module>()
27 print(A)
28
---> 29 np.multiply(train_set_y,A)
30
31 def initialize_with_zeros(numberOfTrainingExamples):
Run Code Online (Sandbox Code Playgroud)
ValueError:操作数无法与形状一起广播(3,3)(1,4)
网络代码:
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from scipy import ndimage
import pandas as pd
%matplotlib inline
train_set_x = np.array([
[1,1,1,1],[0,1,1,1],[0,0,1,1]
])
train_set_y = np.array([
[1,1,1,0],[1,1,0,0],[1,1,1,1]
])
numberOfFeatures = 4
numberOfTrainingExamples = 3
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
w = np.zeros((numberOfTrainingExamples , 1))
b = 0
A = sigmoid(np.dot(w.T , train_set_x))
print(A)
np.multiply(train_set_y,A)
def initialize_with_zeros(numberOfTrainingExamples):
w = np.zeros((numberOfTrainingExamples , 1))
b = 0
return w, b
def propagate(w, b, X, Y):
m = X.shape[1]
A = sigmoid(np.dot(w.T , X) + b)
cost = -(1/m)*np.sum(np.multiply(Y,np.log(A)) + np.multiply((1-Y),np.log(1-A)), axis=1)
dw = ( 1 / m ) * np.dot( X, ( A - Y ).T ) # consumes ( A - Y )
db = ( 1 / m ) * np.sum( A - Y ) # consumes ( A - Y ) again
# cost = np.squeeze(cost)
grads = {"dw": dw,
"db": db}
return grads, cost
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = True):
costs = []
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
w = w - (learning_rate * dw)
b = b - (learning_rate * db)
if i % 100 == 0:
costs.append(cost)
if print_cost and i % 10000 == 0:
print(cost)
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
def model(X_train, Y_train, num_iterations, learning_rate = 0.5, print_cost = False):
w, b = initialize_with_zeros(numberOfTrainingExamples)
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = True)
w = parameters["w"]
b = parameters["b"]
Y_prediction_train = sigmoid(np.dot(w.T , X_train) + b)
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.0001, print_cost = True)
Run Code Online (Sandbox Code Playgroud)
更新:此实现中存在一个错误,即训练示例对(train_set_x , train_set_y)必须包含相同的维度.可以指出线性代数应该如何修改的方向?
更新2:
我修改了@Paul Panzer的答案,学习率为0.001,train_set_x,train_set_y对是唯一的:
train_set_x = np.array([
[1,1,1,1,1],[0,1,1,1,1],[0,0,1,1,0],[0,0,1,0,1]
])
train_set_y = np.array([
[1,0,0],[0,0,1],[0,1,0],[1,0,1]
])
grads = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True)
# To classify single training example :
print(sigmoid(dw @ [0,0,1,1,0] + db))
Run Code Online (Sandbox Code Playgroud)
此更新产生以下输出:
-2.09657359028
-3.94918577439
[[ 0.74043089 0.32851512 0.14776077 0.77970162]
[ 0.04810012 0.08033521 0.72846174 0.1063849 ]
[ 0.25956911 0.67148488 0.22029838 0.85223923]]
[[1 0 0 1]
[0 0 1 0]
[0 1 0 1]]
train accuracy: 79.84462279013312 %
[[ 0.51309252 0.48853845 0.50945862]
[ 0.5110232 0.48646923 0.50738869]
[ 0.51354109 0.48898712 0.50990734]]
Run Code Online (Sandbox Code Playgroud)
应该print(sigmoid(dw @ [0,0,1,1,0] + db))产生一次舍入匹配train_set_y对应值的向量:[0,1,0]?
修改生成一个向量(添加[0,0,1,1,0]到numpy数组并进行转置):
print(sigmoid(dw @ np.array([[0,0,1,1,0]]).T + db))
Run Code Online (Sandbox Code Playgroud)
回报:
array([[ 0.51309252],
[ 0.48646923],
[ 0.50990734]])
Run Code Online (Sandbox Code Playgroud)
同样,将这些值舍入为最接近的整数会[1,0,1]在[0,1,0]预期时生成向量.
这些是不正确的操作,以产生单个训练示例的预测?
你的困难来自于不匹配的维度,所以让我们来解决问题并尝试直截了当.
您的网络有许多输入,功能,让我们拨打他们的号码N_in(numberOfFeatures在您的代码中).并且它有许多输出,它们对应于不同的类,我们可以调用它们的编号N_out.输入和输出通过权重连接w.
现在这是问题所在.连接是全部的,因此我们需要为每N_out x N_in对输出和输入赋予权重.因此,在您的代码中,w必须将形状更改为(N_out, N_in).您可能还希望b每个输出都有一个偏移量,因此b应该是大小的向量,(N_out,)或者更确切地说(N_out, 1)它与2d项一致.
我已经在下面的修改代码中修复了这个问题,我试图让它非常明确.我还把一个模拟数据创建者扔进了讨价还价.
重新编写一个热门编码的分类输出,我不是神经网络的专家,但我认为,大多数人都理解它,因此类是互斥的,因此模拟输出中的每个样本都应该有一个,其余为0.
边注:
在某一点上,一个竞争的答案建议你摆脱1-...成本函数中的条款.虽然这对我来说看起来像一个有趣的想法我的直觉(编辑现在使用无梯度最小化器确认;在下面的代码中使用activation ="hybrid".解算器将简单地最大化在至少一个训练示例中活动的所有输出.)是它不会像那样工作,因为成本将无法惩罚误报(详见下文).为了使它工作,你必须添加某种正规化.似乎有效的一种方法是使用softmax而不是sigmoid.这softmax是sigmoid二进制的热点.它确保输出"模糊一热".
因此我的建议是:
sigmoid并且没有明确强制执行一个热门的预测.保持这个1-...词.softmax而不是sigmoid.我activation="sigmoid"|"softmax"|"hybrid"在代码之间添加了一个参数,可以在模型之间切换.我还提供了scipy通用最小化器,当成本的梯度不在时,这可能是有用的.
回顾成本函数的工作原理:
费用是所有课程和该学期所有培训样本的总和
-y log (y') - (1-y) log (1-y')
Run Code Online (Sandbox Code Playgroud)
其中y是预期的响应,即输入的"y"训练样本给出的响应("x"训练样本).y'是预测,网络以其当前权重和偏差生成的响应.现在,由于预期响应为0或1,因此可以编写单个类别和单个训练样本的成本
-log (y') if y = 1
-log(1-y') if y = 0
Run Code Online (Sandbox Code Playgroud)
因为在第一种情况下(1-y)为零,所以第二项消失,在第二种情况下y为零,所以第一项消失.人们现在可以说服自己,如果成本很高
换句话说,成本在惩罚错误预测方面发挥了作用.现在,如果我们放弃第二个任期(1-y) log (1-y'),这个机制的一半就消失了.如果预期响应为1,则低预测仍将产生成本,但如果预期响应为0,则无论预测如何,成本将为零,特别是高预测(或误报)将不受惩罚.
现在,因为总费用是所有训练样本的总和,所以有三种可能性.
所有训练样本都规定该等级为零:那么费用将完全独立于该课程的预测而且不能进行任何学习
一些训练样本将该等级设为零,有些则为一:然后因为"假阴性"或"未命中"仍然受到惩罚,但误报不是网络将找到最简单的方法来最小化成本,即不加区分地增加预测所有样本的类
所有训练样本都规定该类为一:基本上与第二种情况相同,只是在这里没有问题,因为这是正确的行为
最后,为什么我们使用softmax而不是sigmoid?误报仍然是隐形的.现在很容易看出softmax的所有类的总和是1.因此,如果至少减少一个其他类来补偿,我只能增加一个类的预测.特别是,没有假阴性就没有误报,成本会检测到假阴性.
关于如何获得二进制预测:
对于二进制预期响应,舍入确实是适当的过程.对于单热,我宁愿找到最大值,将其设置为一个,将所有其他值设置为零.我添加了一个便利功能predict,实现它.
import numpy as np
from scipy import optimize as opt
from collections import namedtuple
# First, a few structures to keep ourselves organized
Problem_Size = namedtuple('Problem_Size', 'Out In Samples')
Data = namedtuple('Data', 'Out In')
Network = namedtuple('Network', 'w b activation cost gradient most_likely')
def get_dims(Out, In, transpose=False):
"""extract dimensions and ensure everything is 2d
return Data, Dims"""
# gracefully acccept lists etc.
Out, In = np.asanyarray(Out), np.asanyarray(In)
if transpose:
Out, In = Out.T, In.T
# if it's a single sample make sure it's n x 1
Out = Out[:, None] if len(Out.shape) == 1 else Out
In = In[:, None] if len(In.shape) == 1 else In
Dims = Problem_Size(Out.shape[0], *In.shape)
if Dims.Samples != Out.shape[1]:
raise ValueError("number of samples must be the same for Out and In")
return Data(Out, In), Dims
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
def sig_cost(Net, data):
A = process(data.In, Net)
logA = np.log(A)
return -(data.Out * logA + (1-data.Out) * (1-logA)).sum(axis=0).mean()
def sig_grad (Net, Dims, data):
A = process(data.In, Net)
return dict(dw = (A - data.Out) @ data.In.T / Dims.Samples,
db = (A - data.Out).mean(axis=1, keepdims=True))
def sig_ml(z):
return np.round(z).astype(int)
def sof_ml(z):
hot = np.argmax(z, axis=0)
z = np.zeros(z.shape, dtype=int)
z[hot, np.arange(len(hot))] = 1
return z
def softmax(z):
z = z - z.max(axis=0, keepdims=True)
z = np.exp(z)
return z / z.sum(axis=0, keepdims=True)
def sof_cost(Net, data):
A = process(data.In, Net)
logA = np.log(A)
return -(data.Out * logA).sum(axis=0).mean()
sof_grad = sig_grad
def get_net(Dims, activation='softmax'):
activation, cost, gradient, ml = {
'sigmoid': (sigmoid, sig_cost, sig_grad, sig_ml),
'softmax': (softmax, sof_cost, sof_grad, sof_ml),
'hybrid': (sigmoid, sof_cost, None, sig_ml)}[activation]
return Network(w=np.zeros((Dims.Out, Dims.In)),
b=np.zeros((Dims.Out, 1)),
activation=activation, cost=cost, gradient=gradient,
most_likely=ml)
def process(In, Net):
return Net.activation(Net.w @ In + Net.b)
def propagate(data, Dims, Net):
return Net.gradient(Net, Dims, data), Net.cost(Net, data)
def optimize_no_grad(Net, Dims, data):
def f(x):
Net.w[...] = x[:Net.w.size].reshape(Net.w.shape)
Net.b[...] = x[Net.w.size:].reshape(Net.b.shape)
return Net.cost(Net, data)
x = np.r_[Net.w.ravel(), Net.b.ravel()]
res = opt.minimize(f, x, options=dict(maxiter=10000)).x
Net.w[...] = res[:Net.w.size].reshape(Net.w.shape)
Net.b[...] = res[Net.w.size:].reshape(Net.b.shape)
def optimize(Net, Dims, data, num_iterations, learning_rate, print_cost = True):
w, b = Net.w, Net.b
costs = []
for i in range(num_iterations):
grads, cost = propagate(data, Dims, Net)
dw = grads["dw"]
db = grads["db"]
w -= learning_rate * dw
b -= learning_rate * db
if i % 100 == 0:
costs.append(cost)
if print_cost and i % 10000 == 0:
print(cost)
return grads, costs
def model(X_train, Y_train, num_iterations, learning_rate = 0.5, print_cost = False, activation='sigmoid'):
data, Dims = get_dims(Y_train, X_train, transpose=True)
Net = get_net(Dims, activation)
if Net.gradient is None:
optimize_no_grad(Net, Dims, data)
else:
grads, costs = optimize(Net, Dims, data, num_iterations, learning_rate, print_cost = True)
Y_prediction_train = process(data.In, Net)
print(Y_prediction_train)
print(data.Out)
print(Y_prediction_train.sum(axis=0))
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - data.Out)) * 100))
return Net
def predict(In, Net, probability=False):
In = np.asanyarray(In)
is1d = In.ndim == 1
if is1d:
In = In.reshape(-1, 1)
Out = process(In, Net)
if not probability:
Out = Net.most_likely(Out)
if is1d:
Out = Out.reshape(-1)
return Out
def create_data(Dims):
Out = np.zeros((Dims.Out, Dims.Samples), dtype=int)
Out[np.random.randint(0, Dims.Out, (Dims.Samples,)), np.arange(Dims.Samples)] = 1
In = np.random.randint(0, 2, (Dims.In, Dims.Samples))
return Data(Out, In)
train_set_x = np.array([
[1,1,1,1,1],[0,1,1,1,1],[0,0,1,1,0],[0,0,1,0,1]
])
train_set_y = np.array([
[1,0,0],[1,0,0],[0,0,1],[0,0,1]
])
Net1 = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True, activation='sigmoid')
Net2 = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True, activation='softmax')
Net3 = model(train_set_x, train_set_y, num_iterations = 20000, learning_rate = 0.001, print_cost = True, activation='hybrid')
Dims = Problem_Size(8, 100, 50)
data = create_data(Dims)
model(data.In.T, data.Out.T, num_iterations = 40000, learning_rate = 0.001, print_cost = True, activation='softmax')
model(data.In.T, data.Out.T, num_iterations = 40000, learning_rate = 0.001, print_cost = True, activation='sigmoid')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
535 次 |
| 最近记录: |