小编Dav*_*Liu的帖子

Bayesian optimization for a Light GBM Model

I am able to successfully improve the performance of my XGBoost model through Bayesian optimization, but the best I can achieve through Bayesian optimization when using Light GBM (my preferred choice) is worse than what I was able to achieve by using it’s default hyper-parameters and following the standard early stopping approach.

When tuning via Bayesian optimization, I have been sure to include the algorithm’s default hyper-parameters in the search surface, for reference purposes.

The code below shows the RMSE …

python bayesian pandas hyperparameters lightgbm

xxy*_*xyy

2019 05-23

5
推荐指数

1
解决办法

618
查看次数

对于原始类型，为什么不使用“is”比较来代替“==”？

当我使用 Pytest 进行 Python 格式化时，它抱怨执行以下操作：

>>> assert some_function_ret_val() == True
E712 comparison to True should be 'if cond is True:' or 'if cond:'

Run Code Online (Sandbox Code Playgroud)

并想要：

assert some_function_ret_val() is True

Run Code Online (Sandbox Code Playgroud)

我知道 True/False/None 只能有一份副本，但我认为所有原语都是不可变类型。

在什么情况下，原始类型的“==”和“is”比较会不同？

不然为什么“==”会成为比较任务的常态呢？

我发现这篇 stackoverflow 帖子讨论了与非原始类型的比较，但我似乎找不到为什么“is”比较对于原始类型可能是危险的原因。与布尔 numpy 数组 VS PEP8 E712 的比较

如果只是约定，我认为“is”比“==”更易读，但我觉得可能存在一些疯狂的边缘情况，其中可能有多个原始类型的副本。

python comparison immutability primitive-types

Dav*_*Liu

2019 05-22

4
推荐指数

1
解决办法

1232
查看次数

ValueError: CountVectorizer() 的输入数组维度不正确

在 sklearn 管道中使用 make_column_transformer() 时，我在尝试使用 CountVectorizer 时遇到错误。

我的 DataFrame 有两列，'desc-title'和'SPchangeHigh'. 这是两行的片段：

features = pd.DataFrame([["T. Rowe Price sells most of its Tesla shares", .002152],
                         ["Gannett to retain all seats in MNG proxy fight", 0.002152]],
                        columns=["desc-title", "SPchangeHigh"])

Run Code Online (Sandbox Code Playgroud)

我能够毫无问题地运行以下管道：

preprocess = make_column_transformer(
    (StandardScaler(),['SPchangeHigh']),
    ( OneHotEncoder(),['desc-title'])
)
preprocess.fit_transform(features.head(2))

Run Code Online (Sandbox Code Playgroud)

但是，当我用CountVectorizer(tokenizer=tokenize)替换OneHotEncoder()时，它失败了：

preprocess = make_column_transformer( (StandardScaler(),['SPchangeHigh']), ( CountVectorizer(tokenizer=tokenize),['desc-title']) ) preprocess.fit_transform(features.head(2))
Run Code Online (Sandbox Code Playgroud)
我得到的错误是这样的：

ValueError Traceback (most recent call last) <ipython-input-71-d77f136b9586> in <module>() 3 ( CountVectorizer(tokenizer=tokenize),['desc-title']) 4 ) ----> 5 preprocess.fit_transform(features.head(2)) C:\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py …
Run Code Online (Sandbox Code Playgroud)

python pipeline scikit-learn

Nab*_*rsi

2019 05-25

3
推荐指数

1
解决办法

554
查看次数

如果名称在列表中，则选择 Pandas 数据框的列，或创建默认值并删除其余部分

我有一个要从 DataFrame 中获取的列名列表。

如果在列表中，我们只想切片指定的列

如果不在列表中，我们要生成一个占位符默认列 0

如果 DataFrame 中有其他列名称，则它们无关紧要，应删除或以其他方式忽略。

添加单个Pandas列是显而易见的：Pandas: Add column if does not exist ，但我正在寻找一种有效且清晰的方法来添加多个列（如果它们不存在）。

d = {'a': [1, 2], 'b': [3, 4], 'c': [5,6], 'd': [7,8]} df = pd.DataFrame(d) df a b c d 0 1 3 5 7 1 2 4 6 8 requested_cols = ['a','b','x','y','z']
Run Code Online (Sandbox Code Playgroud)
我试过类似的东西：

valid_cols = df.columns.values missing_col_names = [col_name for col_name in requested_cols if col_name not in valid_cols] df = df.reindex(list(df) + missing_col_names, axis=1).fillna(0) df = df.loc[:,df.columns.isin(valid_cols)] df = …
Run Code Online (Sandbox Code Playgroud)

python pandas python-3.7

Dav*_*Liu

lucky-day

3
推荐指数

1
解决办法

2442
查看次数

使用 bash 获取当前分支并存储为变量

当我使用 Kubernetes 时，我想运行依赖于我的活动分支的命令。因此，在给定当前分支的情况下，拥有别名将帮助我使用自动运行命令的其他别名。

我试图使用函数将当前活动本地分支的名称存储到 bash 别名中，以便我可以运行其他脚本而不必担心指定活动分支，但我一直遇到错误。

function branch () { local result='git branch | grep ^\* | cut -c 3-'; echo "$result" } alias get_branch=$(branch)
Run Code Online (Sandbox Code Playgroud)
但是当我尝试运行它时，我得到：

usage: git [--version] [--help] [-C <path>] [-c <name>=<value>] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] <command> [<args> .... (Same output as just entering 'git')
Run Code Online (Sandbox Code Playgroud)
（当列出活动的 git 分支时，cut -c 3-删除其后面的和空格）例如*

* feature/ch20372 ch20372 ch12345

奇怪的是这两项工作：

alias IMLAZY='git branch |grep \* | …
Run Code Online (Sandbox Code Playgroud)

git bash

Dav*_*Liu

2019 07-24

-1
推荐指数

1
解决办法

1086
查看次数

标签统计

python ×4

pandas ×2

bash ×1

bayesian ×1

comparison ×1

git ×1

hyperparameters ×1

immutability ×1

lightgbm ×1

pipeline ×1

primitive-types ×1

python-3.7 ×1

scikit-learn ×1

Bayesian optimization for a Light GBM Model

对于原始类型，为什么不使用“is”比较来代替“==”？

ValueError: CountVectorizer() 的输入数组维度不正确

如果名称在列表中，则选择 Pandas 数据框的列，或创建默认值并删除其余部分

使用 bash 获取当前分支并存储为变量

标签 统计

小编Dav_Liu的帖子

标签统计