小编win*_*win的帖子

sklearn管道中具有GridSearchCV，缩放，PCA和早期停止功能的XGBoost

我想将XGBoost模型与输入缩放和PCA减少的功能空间结合起来。另外，应该使用交叉验证来调整模型的超参数以及PCA中使用的组件数量。为防止模型过拟合，应添加早期停止功能。

为了结合各个步骤，我决定使用sklearn的Pipeline功能。

刚开始，我在确保PCA也应用于验证集方面遇到一些问题。但是我认为使用XGB__eval_set可以达成协议。

该代码实际上在运行时没有任何错误，但似乎可以永远运行（在某些时候，所有内核的CPU使用率均降至零，但进程继续运行数小时；在某些时候不得不终止会话）。

from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBRegressor   

# Train / Test split
X_train, X_test, y_train, y_test = train_test_split(X_with_features, y, test_size=0.2, random_state=123)

# Train / Validation split
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=123)

# Pipeline
pipe = Pipeline(steps=[("Scale", StandardScaler()),
                       ("PCA", PCA()),
                       ("XGB", XGBRegressor())])

# Hyper-parameter grid (Test only)
grid_param_pipe = {'PCA__n_components': [5],
                   'XGB__n_estimators': [1000],
                   'XGB__max_depth': [3], …

Run Code Online (Sandbox Code Playgroud)

python pca scikit-learn grid-search xgboost

win*_*win

lucky-day

5
推荐指数

1
解决办法

1917
查看次数

Word Mover Distance 和 Bert-Embedding 的文档相似度

我正在尝试使用基于Google 的 BERT 的词嵌入来计算两个任意文档的文档相似度（最近邻）。为了从 Bert 获得词嵌入，我使用bert-as-a-service。文档相似度应该基于 Word-Mover-Distance 和 python wmd-relax包。

我以前的尝试是从wmd-relaxgithub 存储库中的本教程开始的：https : //github.com/src-d/wmd-relax/blob/master/spacy_example.py

import numpy as np
import spacy
import requests
from wmd import WMD
from collections import Counter
from bert_serving.client import BertClient

# Wikipedia titles
titles = ["Germany", "Spain", "Google", "Apple"]

# Standard model from spacy
nlp = spacy.load("en_vectors_web_lg")

# Fetch wiki articles and prepare as specy document
documents_spacy = {}
print('Create spacy document')
for title in titles:
    print("... fetching", …

Run Code Online (Sandbox Code Playgroud)

python nlp similarity word-embedding

win*_*win

2019 03-12

5
推荐指数

1
解决办法

3066
查看次数

Google Cloud Build 获取身份令牌

在我的场景中，我想在 Google Cloud Build 期间触发基于 HTTP 端点的 Google Cloud Function。HTTP 请求是使用 python:3.7-slim 容器的步骤完成的。

基于文档中的此示例和此示例，我想使用以下代码：

REGION = 'us-central1'
PROJECT_ID = 'name-of-project'
RECEIVING_FUNCTION = 'my-cloud-function'

function_url = f'https://{REGION}-{PROJECT_ID}.cloudfunctions.net/{RECEIVING_FUNCTION}'

metadata_server_url = 'http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience='
token_full_url = metadata_server_url + function_url
token_headers = {'Metadata-Flavor': 'Google'}

token_response = requests.get(token_full_url, headers=token_headers)
jwt = token_response.text
print(jwt)

r = requests.post(url=function_url, headers=function_headers, json=payload)

Run Code Online (Sandbox Code Playgroud)

令人惊讶的是，代码失败了，因为jwt（Not Found根据print声明）。我已经通过硬编码有效的身份令牌来测试代码和 IAM 设置，并且还在同一项目内的测试虚拟机上测试了完全相同的获取机制。问题似乎是获取一些元数据在云构建中不起作用。

我错过了什么吗？感谢您的任何帮助！

google-cloud-platform google-cloud-functions google-cloud-build

win*_*win

lucky-day

2
推荐指数

1
解决办法

2951
查看次数