我有一个NumPy数组A.我想知道A中元素的索引等于一个值,哪些索引满足某些条件:
import numpy as np
A = np.array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
value = 2
ind = np.array([0, 1, 5, 10]) # Index belongs to ind
Run Code Online (Sandbox Code Playgroud)
这是我做的:
B = np.where(A==value)[0] # Gives the indexes in A for which value = 2
print(B)
[1 5 9]
mask = np.in1d(B, ind) # Gives the index values that belong to the ind array
print(mask)
array([ True, True, False], dtype=bool)
print B[mask] # Is the …Run Code Online (Sandbox Code Playgroud) 我正在寻找一种快速计算滚动总和的方法,可能使用Numpy.这是我的第一种方法:
def func1(M, w):
Rtn = np.zeros((M.shape[0], M.shape[1]-w+1))
for i in range(M.shape[1]-w+1):
Rtn[:,i] = np.sum(M[:, i:w+i], axis=1)
return Rtn
M = np.array([[0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0.],
[0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1.],
[1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])
window_size = 4
print func1(M, window_size)
[[ 0. 0. 1. 2. 2. 3. 3. 3. 3. 2.]
[ 1. 2. …Run Code Online (Sandbox Code Playgroud) 我想监测多类梯度增强分类器训练过程中的损失,以此来了解是否发生过度拟合.这是我的代码:
%matplotlib inline
import numpy as np
#import matplotlib.pyplot as plt
import matplotlib.pylab as plt
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
n_est = 100
clf = GradientBoostingClassifier(n_estimators=n_est, max_depth=3, random_state=2)
clf.fit(X_train, y_train)
test_score = np.empty(len(clf.estimators_))
for i, pred in enumerate(clf.staged_predict(X_test)):
test_score[i] = clf.loss_(y_test, pred)
plt.plot(np.arange(n_est) + 1, test_score, label='Test')
plt.plot(np.arange(n_est) + 1, clf.train_score_, label='Train')
plt.show()
Run Code Online (Sandbox Code Playgroud)
但是我收到以下值错误:
--------------------------------------------------------------------------- …Run Code Online (Sandbox Code Playgroud) 我使用以下脚本执行轨迹的连续段(xy坐标)的叉积:
In [129]:
def func1(xy, s):
size = xy.shape[0]-2*s
out = np.zeros(size)
for i in range(size):
p1, p2 = xy[i], xy[i+s] #segment 1
p3, p4 = xy[i+s], xy[i+2*s] #segment 2
out[i] = np.cross(p1-p2, p4-p3)
return out
def func2(xy, s):
size = xy.shape[0]-2*s
p1 = xy[0:size]
p2 = xy[s:size+s]
p3 = p2
p4 = xy[2*s:size+2*s]
tmp1 = p1-p2
tmp2 = p4-p3
return tmp1[:, 0] * tmp2[:, 1] - tmp2[:, 0] * tmp1[:, 1]
In [136]:
xy = np.array([[1,2],[2,3],[3,4],[5,6],[7,8],[2,4],[5,2],[9,9],[1,1]])
func2(xy, 2)
Out[136]: …Run Code Online (Sandbox Code Playgroud) 我正在尝试使这个估计器与 scikit-learn 兼容,以便我可以使用 GridSearchCV 搜索参数空间。
编辑:
我已按照建议修改了脚本(见下文)。
fit(self, X, y)__init__GripdSearchCV 仍然存在兼容性问题,可能是因为估计器是多标签分类器。
ValueError: Can't handle mix of multilabel-indicator and continuous-multioutput
Run Code Online (Sandbox Code Playgroud)
但这已经超出了重点。属性错误现已消失。因此,我们可以放心地得出结论,建议的修改使估计器 scikit-learn 兼容。
最终代码脚本:
import numpy as np
from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.preprocessing import LabelBinarizer
from sklearn.cross_validation import train_test_split
from sklearn.base import BaseEstimator, ClassifierMixin
class LogisticClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, basis=None, itrs=100, learn_rate=0.1, reg=0.1, momentum=0.5, proj_layer_size=10):
self.W = []
self.A = None
if basis == 'rectifier':
self.basisfunc = self.rectifier_basis
else:
self.basisfunc = self.identity …Run Code Online (Sandbox Code Playgroud) 这是轨迹的连续段(xy坐标)之间的点积的函数.结果如预期,但"for循环"使它非常慢.
In [94]:
def func1(xy, s):
size = xy.shape[0]-2*s
out = np.zeros(size)
for i in range(size):
p1, p2 = xy[i], xy[i+s] #segment 1
p3, p4 = xy[i+s], xy[i+2*s] #segment 2
out[i] = np.dot(p1-p2, p4-p3)
return out
xy = np.array([[1,2],[2,3],[3,4],[5,6],[7,8],[2,4],[5,2],[9,9],[1,1]])
func1(xy, 2)
Out[94]:
array([-16., 15., 32., 31., -14.])
Run Code Online (Sandbox Code Playgroud)
我找了一种方法来矢量化上面的内容,希望能让它更快.这是我想出的:
In [95]:
def func2(xy, s):
size = xy.shape[0]-2*s
p1 = xy[0:size]
p2 = xy[s:size+s]
p3 = p2
p4 = xy[2*s:size+2*s]
return np.diagonal(np.dot((p1-p2), (p4-p3).T))
func2(xy, 2)
Out[95]:
array([-16, 15, 32, 31, -14])
Run Code Online (Sandbox Code Playgroud)
不幸的是,点积产生了一个方阵,我必须从这个矩阵取对角线: …
我有一个不均匀采样的 gps 坐标的 xy 数组。这似乎很明显,但我想将其投影到网格上。这是我的上下文脚本:
import numpy as np
from matplotlib.mlab import griddata
gps_track = np.array([[0,0],[1.2,2.3],[1.9,3],[3.2,4.3],[4,2.9],[6.5,3.1]])
x = gps_track[:,0]
y = gps_track[:,1]
# define grid
binsize = 1
xmin, xmax = x.min(), x.max()
ymin, ymax = y.min(), y.max()
xi = np.arange(xmin, xmax+binsize, binsize)
yi = np.arange(ymin, ymax+binsize, binsize)
Run Code Online (Sandbox Code Playgroud)
在给定 (x, y) 原始坐标的情况下,我如何从这里开始获得 (xnew, ynew) 值插入到 (xi, yi) 网格上?
# grid the data
xnew, ynew = grid(x, y, xi, yi)
Run Code Online (Sandbox Code Playgroud)
我想我会使用类似于 matplotlib 函数 griddata 的东西:
zi = griddata(x, y, z, xi, …Run Code Online (Sandbox Code Playgroud)