考虑以下代码:
def func1(a):
a[:] = [x**2 for x in a]
a = range(10)
print a #prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
func1(a[:5])
print a #also prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Run Code Online (Sandbox Code Playgroud)
我希望发送列表的一部分a并在函数内更改它。我的预期输出是
[0, 1, 4, 9, 16, 5, 6, 7, 8, 9]
Run Code Online (Sandbox Code Playgroud)
哪种方式是这样做的惯用方式?
谢谢!
我是机器学习的新手,我正在尝试实现线性模型估计器,以提供Scikit来预测二手车的价格。我用线性模型,的不同组合等LinearRegression,Ridge,Lasso和Elastic Net,但它们都在大多数情况下,返回负评分(-0.6 <=评分<= 0.1)。
有人告诉我这是因为多重共线性问题,但是我不知道如何解决。
我的示例代码:
import numpy as np
import pandas as pd
from sklearn import linear_model
from sqlalchemy import create_engine
from sklearn.linear_model import Ridge
engine = create_engine('sqlite:///path-to-db')
query = "SELECT mileage, carcass, engine, transmission, state, drive, customs_cleared, price FROM cars WHERE mark='some mark' AND model='some model' AND year='some year'"
df = pd.read_sql_query(query, engine)
df = df.dropna()
df = df.reindex(np.random.permutation(df.index))
X_full = df[['mileage', 'carcass', 'engine', 'transmission', 'state', 'drive', 'customs_cleared']]
y_full = …Run Code Online (Sandbox Code Playgroud) 我(iid)数据集中的每个样本如下所示:
x = [a_1,a_2 ... a_N,b_1,b_2 ... b_M]
我也有每个样本的标签(这是监督学习)
的一个特点是非常稀疏的(即袋的字表示),而b特征是致密的(整数,还有那些的〜45)
我正在使用scikit-learn,我想将GridSearchCV与管道一起使用.
问题:是否可以在功能类型a上使用一个CountVectorizer,在功能类型b上使用另一个CountVectorizer ?
我想要的可以被认为是:
pipeline = Pipeline([
('vect1', CountVectorizer()), #will work only on features [0,(N-1)]
('vect2', CountVectorizer()), #will work only on features [N,(N+M-1)]
('clf', SGDClassifier()), #will use all features to classify
])
parameters = {
'vect1__max_df': (0.5, 0.75, 1.0), # type a features only
'vect1__ngram_range': ((1, 1), (1, 2)), # type a features only
'vect2__max_df': (0.5, 0.75, 1.0), # type b features …Run Code Online (Sandbox Code Playgroud) 我有一个函数,它需要csr_matrix并对其进行一些计算.
这些计算的行为要求该矩阵的形状是特定的(比如说NxM).
我发送的输入具有较少的列和确切的行数.
(例如它的形状=(A,B),其中A <N且B == M)
例如:我有对象 x
>>>x = csr_matrix([[1,2],[1,2]])
>>>x
(0, 0) 1
(0, 1) 2
(1, 0) 1
(1, 1) 2
>>>x.shape
(2, 2)
Run Code Online (Sandbox Code Playgroud)
功能f:
def f(csr_mat):
"""csr_mat.shape should be (2,3)"""
Run Code Online (Sandbox Code Playgroud)
然后我想做点什么x,所以它会成为y:
>>>y = csr_matrix([[1,2,0],[1,2,0]])
>>>y
(0, 0) 1
(0, 1) 2
(1, 0) 1
(1, 1) 2
>>>y.shape
(2, 3)
Run Code Online (Sandbox Code Playgroud)
在这个例子中,x与y具有相同的无零的值,但y具有不同的形状.我想要的是有效地"扩展" x到一个新的维度,用零填充新的列.也就是说,考虑到x和new_shape=(2,3),它应该返回y. …
from bs4 import BeautifulSoup
import requests
from requests.auth import HTTPProxyAuth
url = "http://www.transtats.bts.gov/Data_Elements.aspx?Data=2"
proxies = {"http":"xxx.xxx.x.xxx: port"}
auth = HTTPProxyAuth("username", "password")
r = requests.get(url, proxies=proxies, auth=auth)
soup = BeautifulSoup(r.text,"html.parser")
viewstate_element = soup.find(id = "__VIEWSTATE").attrs
viewstate = viewstate_element["value"]
eventvalidation_element = soup.find(id="__EVENTVALIDATION").attrs
eventvalidation = eventvalidation_element["value"]
data = {'AirportList':"BOS",'CarrierList':"VX",'Submit':'Submit',"__EVENTTARGET":"","__EVENTARGUMENT":"","__EVENTVALIDATION":eventvalidation,"}
r = requests.post(url, proxies, auth, data )
print r
Run Code Online (Sandbox Code Playgroud)
这个代码在我使用时工作正常requests.get(url, proxies=proxies, auth=auth),但是当有一些数据必须通过requests.post()代理身份验证发送时该怎么办?