idc*_*ark 5 statistics machine-learning clojure incanter logistic-regression
我正在尝试使用Incanter数据分析库在Clojure中实现一个简单的逻辑回归示例.我已经成功编写了Sigmoid和Cost函数,但Incanter的BFGS最小化函数似乎给我带来了一些麻烦.
(ns ml-clj.logistic
(:require [incanter.core :refer :all]
[incanter.optimize :refer :all]))
(defn sigmoid
"compute the inverse logit function, large positive numbers should be
close to 1, large negative numbers near 0,
z can be a scalar, vector or matrix.
sanity check: (sigmoid 0) should always evaluate to 0.5"
[z]
(div 1 (plus 1 (exp (minus z)))))
(defn cost-func
"computes the cost function (J) that will be minimized
inputs:params theta X matrix and Y vector"
[X y]
(let
[m (nrow X)
init-vals (matrix (take (ncol X) (repeat 0)))
z (mmult X init-vals)
h (sigmoid z)
f-half (mult (matrix (map - y)) (log (sigmoid (mmult X init-vals))))
s-half (mult (minus 1 y) (log (minus 1 (sigmoid (mmult X init-vals)))))
sub-tmp (minus f-half s-half)
J (mmult (/ 1 m) (reduce + sub-tmp))]
J))
Run Code Online (Sandbox Code Playgroud)
当我尝试(minimize (cost-func X y) (matrix [0 0]))给出minimize一个函数并启动params时,REPL会抛出一个错误.
ArityException Wrong number of args (2) passed to: optimize$minimize clojure.lang.AFn.throwArity (AFn.java:437)
Run Code Online (Sandbox Code Playgroud)
关于最小化函数的期望值,我感到非常困惑.
作为参考,我在python中重写了所有代码,所有代码都按预期运行,使用相同的最小化算法.
import numpy as np
import scipy as sp
data = np.loadtxt('testSet.txt', delimiter='\t')
X = data[:,0:2]
y = data[:, 2]
def sigmoid(X):
return 1.0 / (1.0 + np.e**(-1.0 * X))
def compute_cost(theta, X, y):
m = y.shape[0]
h = sigmoid(X.dot(theta.T))
J = y.T.dot(np.log(h)) + (1.0 - y.T).dot(np.log(1.0 - h))
cost = (-1.0 / m) * J.sum()
return cost
def fit_logistic(X,y):
initial_thetas = np.zeros((len(X[0]), 1))
myargs = (X, y)
theta = sp.optimize.fmin_bfgs(compute_cost, x0=initial_thetas,
args=myargs)
return theta
Run Code Online (Sandbox Code Playgroud)
输出
Current function value: 0.594902
Iterations: 6
Function evaluations: 36
Gradient evaluations: 9
array([ 0.08108673, -0.12334958])
Run Code Online (Sandbox Code Playgroud)
我不明白为什么Python代码可以成功运行,但我的Clojure实现失败了.有什么建议?
更新
重读文档字符串,minimize我一直在试图计算导致cost-func新错误的衍生物.
(def grad (gradient cost-func (matrix [0 0])))
(minimize cost-func (matrix [0 0]) (grad (matrix [0 0]) X))
ExceptionInfo throw+: {:exception "Matrices of different sizes cannot be differenced.", :asize [2 1], :bsize [1 2]} clatrix.core/- (core.clj:950)
Run Code Online (Sandbox Code Playgroud)
使用trans到1XN COL矩阵转换为NX1行矩阵只是产生具有相反的错误相同的错误.
:asize [1 2], :bsize [2 1]}
我在这里很丢失.
我不能对你的实现说任何话,但incanter.optimize/minimize期望(至少)三个参数,而你只给它两个:
Arguments:
f -- Objective function. Takes a collection of values and returns a scalar
of the value of the function.
start -- Collection of initial guesses for the minimum
f-prime -- partial derivative of the objective function. Takes
a collection of values and returns a collection of partial
derivatives with respect to each variable. If this is not
provided it will be estimated using gradient-fn.
Run Code Online (Sandbox Code Playgroud)
不幸的是,我无法直接告诉您f-prime在这里提供什么(用于?),但也许其他人可以。顺便说一句,我认为这ArityException Wrong number of args (2) passed to [...]实际上非常有帮助。
编辑:实际上,我认为上面的文档字符串不正确,因为源代码不用于gradient-fn估计f-prime. 也许,你可以用它incanter.optimize/gradient来生成你自己的?