小编Mar*_*cia的帖子

使用sklearn的GridSearchCV和管道,只需预处理一次

我正在使用scickit-learn来调整模型超参数.我正在使用管道将预处理链接到估算器.我的问题的简单版本看起来像这样:

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


grid = GridSearchCV(make_pipeline(StandardScaler(), LogisticRegression()),
                    param_grid={'logisticregression__C': [0.1, 10.]},
                    cv=2,
                    refit=False)

_ = grid.fit(X=np.random.rand(10, 3),
             y=np.random.randint(2, size=(10,)))

Run Code Online (Sandbox Code Playgroud)

在我的情况下,预处理(在玩具示例中将是StandardScale())是耗时的,并且我没有调整它的任何参数.

因此,当我执行该示例时,StandardScaler执行12次.2拟合/预测*2 cv*3参数.但是每次为参数C的不同值执行StandardScaler时,它都返回相同的输出,因此计算它一次就更有效率,然后只运行管道的估算器部分.

我可以在预处理(没有调整超参数)和估算器之间手动拆分管道.但是要将预处理应用于数据,我应该只提供训练集.所以,我必须手动实现拆分,而根本不使用GridSearchCV.

是否有一种简单/标准的方法可以避免在使用GridSearchCV时重复预处理？

python numpy machine-learning scikit-learn grid-search

Mar*_*cia

2018 08-25

21
推荐指数

3
解决办法

1万
查看次数

如何在Jupyter 5.0中禁用自动引号和自动括号

我将Jupyter升级到最新的版本5.0,看起来我的前端配置停止了工作.

我不明白为什么Jupyter默认带有自动关闭引号和括号,我觉得很烦人.因此,在每个版本中,我必须更改设置以禁用它.

它曾经通过创建文件~/.jupyter/custom/custom.js并添加下一个JavaScript代码来工作:

require(['notebook/js/codecell'], function (codecell) {
  codecell.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})

Run Code Online (Sandbox Code Playgroud)

我已经读过,因为Jupyter 4这个代码可以通过以下方式改变:

IPython.CodeCell.options_default.cm_config.autoCloseBrackets = false;

Run Code Online (Sandbox Code Playgroud)

但看起来在Jupyter 5中,前两个选项停止了工作.

我发现的有关前端配置的文档没有帮助(一旦我理解,我会很乐意改进它):

http://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#frontend-config

谁能帮助我了解如何在Jupyter 5中禁用自动括号和自动引用？

这是我正在运行的确切版本:

jupyter jupyter-notebook

Mar*_*cia

lucky-day

14
推荐指数

4
解决办法

3443
查看次数

将Python序列(时间序列/数组)拆分为具有重叠的子序列

我需要提取给定窗口的时间序列/数组的所有子序列.例如:

>>> ts = pd.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> window = 3
>>> subsequences(ts, window)
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [5, 7, 8],
       [6, 8, 9]])

Run Code Online (Sandbox Code Playgroud)

迭代序列的朴素方法当然是昂贵的,例如:

def subsequences(ts, window):
    res = []
    for i in range(ts.size - window + 1):
        subts = ts[i:i+window]
        subts.reset_index(drop=True, inplace=True)
        subts.name = None
        res.append(subts)
    return pd.DataFrame(res)

Run Code Online (Sandbox Code Playgroud)

我找到了一种更好的方法,通过复制序列,将其移动一个不同的值,直到窗口被覆盖,然后用不同的序列分割reshape.性能大约好100倍,因为for循环迭代窗口大小,而不是序列大小:

def subsequences(ts, window):
    res = [] …

Run Code Online (Sandbox Code Playgroud)

python performance numpy time-series pandas

Mar*_*cia

2015 01-10

8
推荐指数

1
解决办法

3548
查看次数

记录 Django 命令中的异常

我已经在 Django 中实现了自定义命令，并且它们的异常未记录在我的日志文件中。

我创建了一个应用程序my_app_with_commands，其中包含一个目录management/commands，我在其中实现了一些命令。

示例命令可能如下所示，由于异常而崩溃：

import logging
from django.core.management.base import BaseCommand


class Command(BaseCommand):
    help = 'Do something usful'
    log = logging.getLogger(__name__)

    def handle(self, *args, **options):
        self.log.info('Starting...')
        raise RuntimeError('Something bad happened')
        self.log.info('Done.')

Run Code Online (Sandbox Code Playgroud)

我的日志配置是这样的：

LOGGING = { 
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'normal': {
            'format': '%(asctime)s %(module)s %(levelname)s %(message)s',
        }
    },  
    'handlers': {
        'file': {
            'level': 'INFO',
            'class': 'logging.FileHandler',
            'filename': os.path.join(BASE_DIR, '..', 'logs', 'my_log.log'),
            'formatter': 'normal',
        },
        'mail_admins': {
            'level': 'ERROR',
            'class': 'django.utils.log.AdminEmailHandler',
            'include_html': True,
        }
    }, …

Run Code Online (Sandbox Code Playgroud)

python django logging

Mar*_*cia

lucky-day

5
推荐指数

1
解决办法

2182
查看次数

IPython3自动配置%matplotlib内联

我正在使用IPython 3/Jupyter,我想默认使用%matplotlib选项.

在IPython 2中,我在〜/ .ipython/profile_default/ipython_notebook_config.py上有下一个选项

c.InteractiveShellApp.matplotlib = 'inline'

Run Code Online (Sandbox Code Playgroud)

但是在IPython 3中,此选项不再可用.

它也消失了设置c.InlineBackend.rc,我用它来设置图表的外观.

现在如何设置这些选项？

python matplotlib ipython ipython-notebook

Mar*_*cia

lucky-day

4
推荐指数

1
解决办法

2551
查看次数

标签统计

python ×4

numpy ×2

django ×1

grid-search ×1

ipython ×1

ipython-notebook ×1

jupyter ×1

jupyter-notebook ×1

logging ×1

machine-learning ×1

matplotlib ×1

pandas ×1

performance ×1

scikit-learn ×1

time-series ×1

使用sklearn的GridSearchCV和管道,只需预处理一次

如何在Jupyter 5.0中禁用自动引号和自动括号

将Python序列(时间序列/数组)拆分为具有重叠的子序列

记录 Django 命令中的异常

IPython3自动配置%matplotlib内联

标签 统计

小编Mar_cia的帖子

标签统计