小编Aiz_aac的帖子

scikit中的pipeline和make_pipeline有什么区别？

我从sklearn网页上得到了这个:

a)管道:使用最终估算器进行变换的管道

b)Make_pipeline:根据给定的估算器构造管道.这是Pipeline构造函数的简写.

但是当我必须使用每一个时,我仍然不明白.谁能举个例子？

python scikit-learn

28
推荐指数

1
解决办法

1万
查看次数

StratifiedKFold vs StratifiedShuffleSplit vs StratifiedKFold + Shuffle

有什么区别:StratifiedKFold,StratifiedShuffleSplit,StratifiedKFold + Shuffle？我什么时候应该使用每一个？当我获得更好的准确度分数？为什么我没有得到类似的结果？我已经把我的代码和结果.我正在使用朴素贝叶斯和10x10交叉验证.

   #######SKF FOR LOOP########
from sklearn.cross_validation import StratifiedKFold
for i in range(10):
    skf = StratifiedKFold(y, n_folds=10, shuffle=True)
    scoresSKF2 = cross_validation.cross_val_score(clf, x, y , cv=skf)
    print(scoresSKF2)
    print("Accuracy SKF_NB: %0.2f (*/- %0.2f)" % (scoresSKF2.mean(), scoresSKF2.std()* 2))
    print("") 

    [ 0.1750503   0.16834532  0.16417051  0.18205424  0.1625758   0.1750939
      0.15495808  0.1712963   0.17096494  0.16918166]
    Accuracy SKF_NB: 0.17 (*/- 0.01)

    [ 0.16297787  0.17956835  0.17309908  0.17686093  0.17239388  0.16093615
     0.16970223  0.16956019  0.15473776  0.17208358]
   Accuracy SKF_NB: 0.17 (*/- 0.01)

    [ 0.17102616  0.16719424  0.1733871   0.16560877  0.166041    0.16122508
     0.16767852  0.17042824  0.18719212  0.1677307 …

Run Code Online (Sandbox Code Playgroud)

python-3.x scikit-learn cross-validation

10
推荐指数

1
解决办法

5097
查看次数

sp_randint如何工作？

我正在做随机森林分类器的超参数优化.我打算使用RandomSearchCV.

因此,通过检查Scikit中的可用代码,可以了解:sp_randint的作用是什么？它是否随机取1到11的值？它可以被其他功能取代吗？

    from scipy.stats import randint as sp_randint

    param_dist = {"n_estimators": sp_randint (1, 11), 
                  "max_depth": [3, None],
                  "max_features": sp_randint(1, 11),
                  "min_samples_split": sp_randint(1, 11),
                  "min_samples_leaf": sp_randint(1, 11),
                 }

Run Code Online (Sandbox Code Playgroud)

谢谢.

python optimization machine-learning scipy scikit-learn

10
推荐指数

1
解决办法

3729
查看次数

使用随机森林时,"ValueError:max_features必须在scikit中为(0,n_features)"

我有20个功能和840行的数据集.我已经优化了分类器(随机森林).我的参数是n_estimators = 100和max_features = 5.我想对每个功能进行分类.我的意思是每个功能我想知道预测的准确性.但是当我使用我的代码时,我收到了一个错误.我正在使用scikit ver.18.

我该如何解决这个问题？

for name in ["AWA"]: 
    x=sio.loadmat('/home/TrainVal/{}_Fp1.mat'.format(name))['x'] 
    s_y=sio.loadmat('/home/TrainVal/{}_Fp1.mat'.format(name))['y']
    y=np.ravel(s_y)

    print(name, x.shape, y.shape) 
    print("")


    clf = make_pipeline(preprocessing.RobustScaler(), RandomForestClassifier(n_estimators = 100, 
                                                                         max_features=5, n_jobs=-1)) 
    #########10x10 SSS##############
    print("10x10")

    for i in range(x.shape[1]): 
        xA=x[:, i].reshape(-1,1)

        xSSSmean = [] 
        for j in range(10):
            sss = StratifiedShuffleSplit(n_splits=10, test_size=0.1, random_state=j) 
            scoresSSS = cross_val_score(clf, xA, y, cv=sss)
            xSSSmean.append(scoresSSS.mean()) 

        result_list.append(np.mean(xSSSmean))  
        plt.bar(i, np.mean(xSSSmean)*100, align = 'center')      
        plt.ylabel('Accuracy')
        plt.xlabel('Features')    
        plt.title('Accuracy per feature: {}_RF_Fp1(20)'.format(name)) 

        xticks=np.arange(i+1)
        plt.xticks(xticks, rotation = 'vertical')
    plt.show()





#THE ERROR


ValueError                                Traceback (most recent call last) …

Run Code Online (Sandbox Code Playgroud)

optimization machine-learning python-3.x random-forest scikit-learn

6
推荐指数

1
解决办法

7634
查看次数

如何将数组列表中的所有数组相加？

我正在使用混淆矩阵.所以对于每个循环我都有一个数组(混淆矩阵).当我做10个循环时,我最终得到了10个数组.我想总结所有这些.所以我决定对每个循环我将数组存储在一个列表中 - 我不知道将它们存储在数组中是否更好.

现在我想添加列表中的每个数组.

所以,如果我有:

    5 0 0       1 1 0
    0 5 0       2 4 0
    0 0 5       2 0 5

Run Code Online (Sandbox Code Playgroud)

总和将是:

    6 1 0
    2 9 0 
    2 0 10

Run Code Online (Sandbox Code Playgroud)

这是我的混淆矩阵和我的数组列表的图片:

这是我的代码:

   list_cm.sum(axis=0)

Run Code Online (Sandbox Code Playgroud)

numpy multidimensional-array python-3.x confusion-matrix

5
推荐指数

1
解决办法

9112
查看次数

如何在python中为绘图添加填充？

我试图在我的情节的左侧和右侧添加填充。但是当我改变 xlim 和 ylim 时；图像变小。

我究竟做错了什么？

import matplotlib.pyplot as plt
plt.rcParams['text.usetex'] = False
from matplotlib.font_manager import FontProperties
import seaborn as sns
%matplotlib inline



df1=df['Total Acc'].round(4)*100 
labels = ['AWA','Rem', 'S1', 'S2', 'SWS', 'SX', 'ALL'] 


rows = [df1.loc[label] for label in labels] 
for row in rows:  
    row.plot(figsize=(10, 5), marker='o')


# http://matplotlib.org/api/legend_api.html ---set_bbox_to_anchor(bbox, transform=None)
myLegend=plt.legend(labels, bbox_to_anchor=(0., 1.15, 1., .102), prop ={'size':10}, loc=10, ncol=7,  #left, bottom, width, height
                title=r'LEARNING CURVE - Fp1_RF(20)')                                         
myLegend.get_title().set_fontsize('18') 


plt.ylim(97.5, 98.5)
plt.xlim(0, 45) 

plt.xlabel('# of samples per subject')
plt.ylabel('Accuracy')

Run Code Online (Sandbox Code Playgroud)

python matplotlib python-3.x pandas seaborn

5
推荐指数

1
解决办法

1万
查看次数

如何在 Ubuntu 15 中安装 graphviz 来为 XGBoost 绘制决策树？

我想绘制 XGBoost 的决策树。我按照此网页的说明进行操作。我收到此错误：

 ImportError: You must install graphviz to plot tree

Run Code Online (Sandbox Code Playgroud)

我尝试从graphviz安装 graphviz_2.38.0-1~saucy_amd64.deb 。但我收到此错误：

 *Dependency is not satisfiable:libgraphviz4(>=2.18)*

Run Code Online (Sandbox Code Playgroud)

所以我做了这个链接上的内容

由于它仍然不起作用我下载了源代码：graphviz-2.40.1.tar.gz 但它仍然不起作用。我想可能是我把它安装在了错误的路径中。那么这是什么意思：确保包含 dot 可执行文件的目录在您的系统路径上？

我已经将它安装在 /etc 中。

我在 conda 中使用虚拟环境

关于我做错了什么的任何建议？

编辑：

我使用了 pip install graphviz。现在我收到这个错误：

CalledProcessError: Command '['dot', '-Tpng']' returned non-zero exit status 1

Run Code Online (Sandbox Code Playgroud)

任何的想法？

5
推荐指数

2
解决办法

5067
查看次数

索引字典列表时出现“只能对某些 xcontent 字节或压缩的 xcontent 字节调用压缩器检测”错误

这个问题与另一个问题相关：\n如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中？

\n

我编写了一个脚本来读取列表（“虚拟”）并将其索引到 Elasticsearch 中。\n我将该列表转换为字典列表，并使用“批量”API 将其索引到 Elasticsearch 中。\n该脚本用于工作（检查相关问题的附加链接）。但添加“timestamp”和函数“initialize_elasticsearch”后不再起作用。

\n

那么，到底出了什么问题呢？我应该使用 JSON 而不是字典列表吗？

\n

我也尝试过只使用列表中的一本字典。在这种情况下，没有错误，但没有任何内容被索引。

\n

这就是错误

\n

\n

这是清单（虚拟）

\n

[\n    "labels: imagenet_labels.txt ",\n    "Model: efficientnet-edgetpu-S_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 23.1",\n    "Time(ms): 5.7",\n    "Inference: corkscrew, bottle screw",\n    "Score: 0.03125 ",\n    "TPU_temp(\xc2\xb0C): 57.05",\n    "labels: imagenet_labels.txt ",\n    "Model: efficientnet-edgetpu-M_quant_edgetpu.tflite ",\n    "Image: insect.jpg ",\n    "Time(ms): 29.3",\n    "Time(ms): 10.8",\n    "Inference: dragonfly, darning needle, devil\'s darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",\n    "Score: 0.09375 …

Run Code Online (Sandbox Code Playgroud)

python bulk-load elasticsearch

5
推荐指数

1
解决办法

2万
查看次数

用Pandas上的值注释条形图(在Seaborn factorplot条形图上)

我写了一些代码来尝试解决这个问题:https: //stackoverflow.com/questions/39477748/how-to-annotate-bars-with-values-on-pandas-on-seaborn-factorplot-bar-plot

我使用了部分代码,可以在这里找到: matplotlib高级条形图

为什么图表这么小？代码只是告诉我从Pandas数据帧中获取准确性.

代码:

sns.set(style="white")
g = sns.factorplot(x="Stages", y="Accuracy", hue="Dataset", data=df, saturation = 5, size=4, aspect=2, kind="bar",
              palette= myPalette, legend=False)

ax=g.ax
def annotateBars(row, ax=ax):
    if row['Accuracy'] < 20:
        color = 'white'
        vertalign = 'bottom'
        vertpad = 2
    else:
        color = 'black'
        vertalign = 'top'
        vertpad = -2

    ax.text(row.name, row['Accuracy'] + vertpad, "{:.1f}%".format(row['Accuracy']),
            zorder=10, rotation=90, color=color,
            horizontalalignment='center',
            verticalalignment=vertalign,
            fontsize=12, weight='heavy')

junk = df.apply(annotateBars, ax=ax, axis=1)

Run Code Online (Sandbox Code Playgroud)

这是用于注释每个条形码的代码,但是......使用Pandas和Matplotlib.唯一的问题是,我不知道如何更改颜色和分组"x轴":(

    df = df.set_index('Stages')
    ax = df.plot.bar(title="Accuracy")
    ax.set_ylim(0, 120)
    for p in …

Run Code Online (Sandbox Code Playgroud)

plot matplotlib python-3.x pandas seaborn

4
推荐指数

1
解决办法

9403
查看次数

如何在seaborn factorplot中将标题加粗

标题如何加粗？到现在为止，我刚刚更改了字体大小。我试过字体粗细，但它不存在。

    sns.plt.title('I AM A TITLE').set_fontsize('18')

Run Code Online (Sandbox Code Playgroud)

data-visualization python-3.x seaborn

4
推荐指数

1
解决办法

8096
查看次数

简单来说，JAX、TRAX 和 TensorRT 之间有什么区别？

我一直在使用 TensorRT 和 TensorFlow-TRT 来加速我的 DL 算法的推理。

然后我听说过：

JAX https://github.com/google/jax
Trax https://github.com/google/trax

两者都“似乎”加速了深度学习。但我很难理解他们。任何人都可以用简单的术语向我解释它们吗？像这样：

TensorRT：这是一个由 NVIDIA 开发的库。它优化了 GPU 中的推理。这提供了低延迟、高吞吐量和减少内存消耗。

谢谢

tensorflow trax

3
推荐指数

1
解决办法

599
查看次数

如何使用 julia 将数据框的列乘以日志？

我有一个数据框。我想将列“b”乘以“日志”，然后用 0 替换 NaN。

我怎么能在朱莉娅做到这一点？我正在检查这个：DataFrames.jl 但我不明白。

df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
           b = repeat([2, 1], outer=[4]),
           c = randn(8))

Run Code Online (Sandbox Code Playgroud)

dataframe julia

2
推荐指数

1
解决办法

386
查看次数

如何对pandas中的列进行求和并将结果添加到新行中？

在这段代码中,我想对每一列求和并将其添加为新行.它做了sum但它没有显示新行.

df = pd.DataFrame(g, columns=('AWA', 'REM', 'S1', 'S2'))
df['xSujeto'] = df.sum(axis=1)
xEstado = df.sum(axis=0)
df.append(xEstado, ignore_index=True)
df

Run Code Online (Sandbox Code Playgroud)

row sum append python-3.x pandas

1
推荐指数

1
解决办法

1323
查看次数

标签统计

scikit-learn ×4

machine-learning ×2

optimization ×2

confusion-matrix ×1

cross-validation ×1

data-visualization ×1

elasticsearch ×1

multidimensional-array ×1

plot ×1

random-forest ×1

row ×1

sum ×1

trax ×1