我正在寻找一种从sklearn中的GridSearchCV图形grid_scores_的方法.在这个例子中,我试图网格搜索SVR算法的最佳gamma和C参数.我的代码如下:
C_range = 10.0 ** np.arange(-4, 4)
gamma_range = 10.0 ** np.arange(-4, 4)
param_grid = dict(gamma=gamma_range.tolist(), C=C_range.tolist())
grid = GridSearchCV(SVR(kernel='rbf', gamma=0.1),param_grid, cv=5)
grid.fit(X_train,y_train)
print(grid.grid_scores_)
Run Code Online (Sandbox Code Playgroud)
运行代码并打印网格分数后,我得到以下结果:
[mean: -3.28593, std: 1.69134, params: {'gamma': 0.0001, 'C': 0.0001}, mean: -3.29370, std: 1.69346, params: {'gamma': 0.001, 'C': 0.0001}, mean: -3.28933, std: 1.69104, params: {'gamma': 0.01, 'C': 0.0001}, mean: -3.28925, std: 1.69106, params: {'gamma': 0.1, 'C': 0.0001}, mean: -3.28925, std: 1.69106, params: {'gamma': 1.0, 'C': 0.0001}, mean: -3.28925, std: 1.69106, params: {'gamma': 10.0, 'C': 0.0001},etc]
Run Code Online (Sandbox Code Playgroud)
我想根据gamma和C参数可视化所有分数(平均值).我想要获得的图表应如下所示: …
我想知道是否有办法在sklearn/python中指定自定义成本函数?我的真正问题有7个不同的类,但为了使它更清楚,我们假设我想为3个不同类的问题指定错误分类的不同成本,我主要感兴趣的是我的模型将正确区分1类和3类.
所以惩罚矩阵看起来如下:
Class 1 Class 2 Class 3
Class 1 0 1 2
Class 2 1 0 1
Class 3 2 1 0
Run Code Online (Sandbox Code Playgroud)
我假设sklearn中的'class_weight'参数执行类似但接受字典而不是矩阵.传递class_weight = {1:2,1:1,1:2}只会增加错误分类1级和3级的权重,但是,我希望我的模型在选择1级时真正获得更大的惩罚,真正的类是3级,反之亦然.
是否有可能在sklearn中做这样的事情?可能是其他一些库/学习算法允许不等的错误分类成本?
我是机器学习领域的新手,现在正试图掌握最常见的学习算法是如何工作的,并了解何时应用它们.目前,我正在学习支持向量机的工作原理,并对自定义内核函数有疑问.
关于SVM的更标准(线性,RBF,多项式)内核,Web上有大量信息.但是,我想了解何时使用自定义内核函数是合理的.我的问题是:
1)SVM的其他可能内核是什么?
2)在哪种情况下会应用自定义内核?
3)定制内核能否显着提高SVM的预测质量?
我的熊猫版本是0.18,我有一个分钟数据,如下所示:
Time
2009-01-30 09:30:00 85.11 100.11
2009-01-30 09:39:00 84.93 100.05
2009-01-30 09:40:00 84.90 100.00
2009-01-30 09:45:00 84.91 99.94
2009-01-30 09:48:00 84.81 99.90
2009-01-30 09:55:00 84.78 100.00
2009-01-30 09:56:00 84.57 100.10
2009-01-30 09:59:00 84.25 100.41
2009-01-30 10:00:00 84.32 100.60
2009-01-30 10:06:00 84.23 101.49
2009-01-30 10:09:00 84.15 101.47
Run Code Online (Sandbox Code Playgroud)
我想仅使用9:30和16:00的数据,并以78分钟的间隔重新采样数据(即将9:30到16:00之间的时间分成5个相等的部分).我的代码如下:
Data= Data.between_time('9:30','16:00')
tframe = '78T'
hourlym = Data.resample(tframe, base=30).mean()
Run Code Online (Sandbox Code Playgroud)
输出:
Time
2009-01-30 08:18:00 85.110000 100.110000
2009-01-30 09:36:00 83.950645 101.984516
2009-01-30 10:54:00 83.372294 103.093824
2009-01-30 12:12:00 83.698624 102.566897
2009-01-30 13:30:00 83.224397 103.076667
2009-01-30 14:48:00 …
Run Code Online (Sandbox Code Playgroud) 我试图找出直到特定日期的工作日数,并收到以下错误:
import numpy as np
import pandas_market_calendars as mcal
from datetime import datetime
import pandas as pd
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays()
holidays = list(holidays.holidays) # NYSE Holidays
today = datetime.now()
expiration = datetime(2019,2,13,0,0)
days_to_expiration = np.busday_count(today,expiration,holidays=holidays)
print(days_to_expiration)
Run Code Online (Sandbox Code Playgroud)
In [6]: days_to_expiration = np.busday_count(today,expiration,holidays=holidays)
Traceback (most recent call last):
File "<ipython-input-6-559c16b20339>", line 1, in <module>
days_to_expiration = np.busday_count(today,expiration,holidays=holidays)
TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
Run Code Online (Sandbox Code Playgroud)
有什么想法吗?
我正在用 python 进行一些基本的卡/牌组操作。下面你可以看到我的 Card 类和 Deck 类。假设我知道有些牌已经死了,并且想将它们从牌组中删除。
import itertools
SUIT_LIST = ("h", "s", "d", "c")
NUMERAL_LIST = ("2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K", "A")
class Card:
def __init__(self, numeral, suit):
self.numeral = numeral
self.suit = suit
self.card = self.numeral, self.suit
def __repr__(self):
return self.numeral + self.suit
class Deck(set):
def __init__(self):
for numeral, suit in itertools.product(NUMERAL_LIST, SUIT_LIST):
self.add(Card(numeral, suit))
deck = Deck()
dead_card = Card('J','s')
deck.remove(dead_card)
Run Code Online (Sandbox Code Playgroud)
引发以下错误:
Traceback (most recent call last):
File "<ipython-input-93-06af62ea1273>", line …
Run Code Online (Sandbox Code Playgroud) 我从熊猫那里得到一种奇怪的行为,我想将我的分钟数据重新采样为每小时数据(使用均值)。我的数据如下所示:
Data.head()
AAA BBB
Time
2009-02-10 09:31:00 86.34 101.00
2009-02-10 09:36:00 86.57 100.50
2009-02-10 09:38:00 86.58 99.78
2009-02-10 09:40:00 86.63 99.75
2009-02-10 09:41:00 86.52 99.66
Data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 961276 entries, 2009-02-10 09:31:00 to 2016-02-29 19:59:00
Data columns (total 2 columns):
AAA 961276 non-null float64
BBB 961276 non-null float64
dtypes: float64(2)
memory usage: 22.0 MB
Data.index
Out[25]:
DatetimeIndex(['2009-02-10 09:31:00', '2009-02-10 09:36:00',
'2009-02-10 09:38:00', '2009-02-10 09:40:00',
'2009-02-10 09:41:00', '2009-02-10 09:44:00',
'2009-02-10 09:45:00', '2009-02-10 09:46:00',
'2009-02-10 09:47:00', '2009-02-10 09:48:00',
...
'2016-02-29 19:41:00', …
Run Code Online (Sandbox Code Playgroud) python ×6
scikit-learn ×3
pandas ×2
python-3.x ×2
datetime ×1
grid-search ×1
numpy ×1
set ×1
svm ×1