标签: scikit-learn

谁能解释一下StandardScaler？

我无法理解网页的StandardScaler的文档中sklearn.

有人能用简单的语言向我解释一下吗？

python scaling machine-learning standardized scikit-learn

nit*_*y23

2019 01-07

70
推荐指数

7
解决办法

9万
查看次数

参数"stratify"来自方法"train_test_split"(scikit Learn)

我试图使用train_test_split包scikit Learn,但我遇到参数问题stratify.以下是代码:

from sklearn import cross_validation, datasets 

X = iris.data[:,:2]
y = iris.target

cross_validation.train_test_split(X,y,stratify=y)

Run Code Online (Sandbox Code Playgroud)

但是,我一直遇到以下问题:

raise TypeError("Invalid parameters passed: %s" % str(options))
TypeError: Invalid parameters passed: {'stratify': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, …

Run Code Online (Sandbox Code Playgroud)

test-data split training-data scikit-learn

Dan*_*vaw

2018 12-10

67
推荐指数

6
解决办法

8万
查看次数

fit_transform()采用2个位置参数,但3个是使用LabelBinarizer

我是机器学习的新手,我一直在使用无监督学习技术.

该图显示了我的样本数据(完全清理后)屏幕截图: 示例数据

我有两个Pipline用于清理数据:

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

print(type(num_attribs))

num_pipeline = Pipeline([
    ('selector', DataFrameSelector(num_attribs)),
    ('imputer', Imputer(strategy="median")),
    ('attribs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

cat_pipeline = Pipeline([
    ('selector', DataFrameSelector(cat_attribs)),
    ('label_binarizer', LabelBinarizer())
])

Run Code Online (Sandbox Code Playgroud)

然后我做了这两个管道的联合,相同的代码如下所示:

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])

Run Code Online (Sandbox Code Playgroud)

现在我试图在数据上做fit_transform 但它显示我的错误.

转型代码:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

Run Code Online (Sandbox Code Playgroud)

错误消息:fit_transform()需要2个位置参数,但是给出了3个

scikit-learn data-science

Vir*_*mar

2017 09-12

67
推荐指数

6
解决办法

2万
查看次数

为什么pydot无法在Windows 8中找到GraphViz的可执行文件？

我在Windows 8中安装了GraphViz 2.32,并将C:\ Program Files(x86)\ Graphviz2.32\bin添加到System PATH变量中.仍然pydot无法找到它的可执行文件.

Traceback (most recent call last):
  File "<pyshell#26>", line 1, in <module>
    graph.write_png('example1_graph.png')
  File "build\bdist.win32\egg\pydot.py", line 1809, in <lambda>
    lambda path, f=frmt, prog=self.prog : self.write(path, format=f, prog=prog))
  File "build\bdist.win32\egg\pydot.py", line 1911, in write
    dot_fd.write(self.create(prog, format))
  File "build\bdist.win32\egg\pydot.py", line 1953, in create
    'GraphViz\'s executables not found' )
InvocationException: GraphViz's executables not found

Run Code Online (Sandbox Code Playgroud)

我发现了这个https://code.google.com/p/pydot/issues/detail?id=65但是无法解决问题.

graphviz pygraphviz pydot scikit-learn

web*_*nja

lucky-day

66
推荐指数

8
解决办法

13万
查看次数

如何获得scikit-learn分类器的最丰富的功能？

liblinear和nltk等机器学习包中的分类器提供了一种方法show_most_informative_features(),它对调试功能非常有用:

viagra = None          ok : spam     =      4.5 : 1.0
hello = True           ok : spam     =      4.5 : 1.0
hello = None           spam : ok     =      3.3 : 1.0
viagra = True          spam : ok     =      3.3 : 1.0
casino = True          spam : ok     =      2.0 : 1.0
casino = None          ok : spam     =      1.5 : 1.0

Run Code Online (Sandbox Code Playgroud)

我的问题是,如果在scikit-learn中为分类器实现类似的东西.我搜索了文档,但找不到类似的东西.

如果还没有这样的功能,有人知道如何获得这些值吗？

非常感谢!

python classification machine-learning scikit-learn

tob*_*gue

2012 06-23

64
推荐指数

5
解决办法

5万
查看次数

在scikit-learn中分层训练/测试分裂

我需要将我的数据分成训练集(75%)和测试集(25%).我目前使用以下代码执行此操作:

X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo)

Run Code Online (Sandbox Code Playgroud)

但是,我想对训练数据集进行分层.我怎么做？我一直在研究这种StratifiedKFold方法,但是不允许我指定75%/ 25%的分割,只对训练数据集进行分层.

python scikit-learn

pir*_*pir

lucky-day

64
推荐指数

5
解决办法

10万
查看次数

scikit .predict()默认阈值

我正在研究不平衡类(5%1)的分类问题.我想预测班级,而不是概率.

在二进制分类问题中,scikit 默认classifier.predict()使用0.5？如果没有,那么默认方法是什么？如果是,我该如何更改？

在scikit中,一些分类器可以class_weight='auto'选择,但并非所有分类器都可以.有class_weight='auto',会.predict()用实际人口比例作为门槛吗？

在像MultinomialNB这样的分类器中不支持的方法是class_weight什么？除了使用predict_proba()然后自己计算类.

python classification machine-learning scikit-learn

ADJ*_*ADJ

2013 11-15

62
推荐指数

5
解决办法

5万
查看次数

了解scikit CountVectorizer中的min_df和max_df

我有五个文本文件,我输入到CountVectorizer.将min_df和max_df指定给CountVectorizer实例时,min/max文档频率的确切含义是什么？它是特定文本文件中单词的频率,还是整个语料库中单词的频率(5个txt文件)？

当min_df和max_df以整数或浮点数形式提供时,它有何不同？

该文档似乎没有提供详尽的解释,也没有提供示例来演示min_df和/或max_df的使用.有人可以提供演示min_df或max_df的解释或示例.

python nlp machine-learning scikit-learn

moe*_*dol

2018 03-06

62
推荐指数

4
解决办法

4万
查看次数

使用sklearn在PCA中恢复explain_variance_ratio_的功能名称

我正在尝试从使用scikit-learn完成的PCA中恢复,这些功能被选为相关的.

IRIS数据集的典型示例.

import pandas as pd
import pylab as pl
from sklearn import datasets
from sklearn.decomposition import PCA

# load dataset
iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# normalize data
df_norm = (df - df.mean()) / df.std()

# PCA
pca = PCA(n_components=2)
pca.fit_transform(df_norm.values)
print pca.explained_variance_ratio_

Run Code Online (Sandbox Code Playgroud)

这回来了

In [42]: pca.explained_variance_ratio_
Out[42]: array([ 0.72770452,  0.23030523])

Run Code Online (Sandbox Code Playgroud)

如何恢复哪两个特征允许数据集中这两个解释的方差？ 不同地说,如何在iris.feature_names中获取此功能的索引？

In [47]: print iris.feature_names
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Run Code Online (Sandbox Code Playgroud)

在此先感谢您的帮助.

python machine-learning pca scikit-learn

maz*_*res

2016 09-20

61
推荐指数

5
解决办法

4万
查看次数

Python中的主成分分析(PCA)

我有一个(26424 x 144)数组,我想用Python执行PCA.但是,网上没有特别的地方可以解释如何实现这个任务(有些网站只是按照自己的方式做PCA - 我没有找到这样做的通用方法).任何有任何帮助的人都会做得很好.

python pca scikit-learn

kha*_*han

2019 02-27

60
推荐指数

5
解决办法

12万
查看次数

标签统计

scikit-learn ×10

python ×7

machine-learning ×5

classification ×2

pca ×2

data-science ×1

graphviz ×1

nlp ×1

pydot ×1

pygraphviz ×1

scaling ×1

split ×1

standardized ×1

test-data ×1

training-data ×1

标签 统计

标签统计