spo*_*234 1 python scipy linear-regression quadratic-programming
我有这样的数据集:
import numpy as np
a = np.array([1.2, 2.3, 4.2])
b = np.array([1, 5, 6])
c = np.array([5.4, 6.2, 1.9])
m = np.vstack([a,b,c])
y = np.array([5.3, 0.9, 5.6])
Run Code Online (Sandbox Code Playgroud)
并希望适应约束线性回归
y = b1*a + b2*b + b3*c
其中所有b的总和为1且为正:b1 + b2 + b3 = 1
R中的类似问题在这里指定:
我怎么能在python中这样做?
编辑: 这两种方法非常通用,可以用于中小规模的实例.要获得更有效的方法,请检查 chthonicdaemon 的答案(使用自定义预处理和scipy的optimize.nnls).
import numpy as np
from scipy.optimize import minimize
a = np.array([1.2, 2.3, 4.2])
b = np.array([1, 5, 6])
c = np.array([5.4, 6.2, 1.9])
m = np.vstack([a,b,c])
y = np.array([5.3, 0.9, 5.6])
def loss(x):
return np.sum(np.square((np.dot(x, m) - y)))
cons = ({'type': 'eq',
'fun' : lambda x: np.sum(x) - 1.0})
x0 = np.zeros(m.shape[0])
res = minimize(loss, x0, method='SLSQP', constraints=cons,
bounds=[(0, np.inf) for i in range(m.shape[0])], options={'disp': True})
print(res.x)
print(np.dot(res.x, m.T))
print(np.sum(np.square(np.dot(res.x, m) - y)))
Run Code Online (Sandbox Code Playgroud)
Optimization terminated successfully. (Exit mode 0)
Current function value: 18.817792344
Iterations: 5
Function evaluations: 26
Gradient evaluations: 5
[ 0.7760881 0. 0.2239119]
[ 1.87173571 2.11955951 4.61630834]
18.817792344
Run Code Online (Sandbox Code Playgroud)
好处:
import numpy as np
from cvxpy import *
a = np.array([1.2, 2.3, 4.2])
b = np.array([1, 5, 6])
c = np.array([5.4, 6.2, 1.9])
m = np.vstack([a,b,c])
y = np.array([5.3, 0.9, 5.6])
X = Variable(m.shape[0])
constraints = [X >= 0, sum_entries(X) == 1.0]
product = m.T * diag(X)
diff = sum_entries(product, axis=1) - y
problem = Problem(Minimize(norm(diff)), constraints)
problem.solve(verbose=True)
print(problem.value)
print(X.value)
Run Code Online (Sandbox Code Playgroud)
ECOS 2.0.4 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -0.000e+00 +2e+01 5e-01 1e-01 1e+00 4e+00 --- --- 1 1 - | - -
1 +2.451e+00 +2.539e+00 +4e+00 1e-01 2e-02 2e-01 8e-01 0.8419 4e-02 2 2 2 | 0 0
2 +4.301e+00 +4.306e+00 +2e-01 5e-03 7e-04 1e-02 4e-02 0.9619 1e-02 2 2 2 | 0 0
3 +4.333e+00 +4.334e+00 +2e-02 4e-04 6e-05 1e-03 4e-03 0.9326 2e-02 2 1 2 | 0 0
4 +4.338e+00 +4.338e+00 +5e-04 1e-05 2e-06 4e-05 1e-04 0.9698 1e-04 2 1 1 | 0 0
5 +4.338e+00 +4.338e+00 +3e-05 8e-07 1e-07 3e-06 7e-06 0.9402 7e-03 2 1 1 | 0 0
6 +4.338e+00 +4.338e+00 +7e-07 2e-08 2e-09 6e-08 2e-07 0.9796 1e-03 2 1 1 | 0 0
7 +4.338e+00 +4.338e+00 +1e-07 3e-09 4e-10 1e-08 3e-08 0.8458 2e-02 2 1 1 | 0 0
8 +4.338e+00 +4.338e+00 +7e-09 2e-10 2e-11 9e-10 2e-09 0.9839 5e-02 1 1 1 | 0 0
OPTIMAL (within feastol=1.7e-10, reltol=1.5e-09, abstol=6.5e-09).
Runtime: 0.000555 seconds.
4.337947939 # needs to be squared to be compared to scipy's output!
# as we are using l2-norm (outer sqrt) instead of sum-of-squares
# which is nicely converted to SOCP-form and easier to
# tackle by SOCP-based solvers like ECOS
# -> does not change the solution-vector x, only the obj-value
[[ 7.76094262e-01]
[ 7.39698388e-10]
[ 2.23905737e-01]]
Run Code Online (Sandbox Code Playgroud)
您可以通过一点数学和获得一个好的解决方案scipy.optimize.nnls:
首先我们做一下数学:
如果
y = b1 * a + b2 * b + b3 * c并且b1 + b2 + b3 = 1,则b3 = 1-b1-b2。
如果我们替代和简化,我们最终会得到
y-c = b1(a-c)+ b2(b-c)
现在,我们没有任何等式约束,并且nnls可以直接求解:
import scipy.optimize
A = np.vstack([a - c, b - c]).T
(b1, b2), norm = scipy.optimize.nnls(A, y - c)
b3 = 1 - b1 - b2
Run Code Online (Sandbox Code Playgroud)
这将恢复使用cvxpy在其他答案中获得的解决方案。
b1 = 0.77608809648662802
b2 = 0.0
b3 = 0.22391190351337198
norm = 4.337947941595865
Run Code Online (Sandbox Code Playgroud)
可以将这种方法推广到任意数量的尺寸,如下所示。假设我们有一个矩阵B,它由列中排列的原始问题的a,b,c构成。任何其他尺寸都将添加到此。
现在,我们可以做
A = B[:, :-1] - B[:, -1:]
bb, norm = scipy.optimize.nnls(A, y - B[:, -1])
bi = np.append(bb, 1 - sum(bb))
Run Code Online (Sandbox Code Playgroud)