我最近一直在使用样式表matplotlib。我真的很喜欢seaborn-white外观的干净程度,我希望能够将边框添加到其他样式中,例如ggplot或seaborn-whitegrid。
如何在我的ax对象周围添加黑色边框fig, ax = plt.subplots()?
import pandas as pd
import numpy as np
from collections import *
Se_data = pd.Series(Counter(np.random.randint(0,10,100)))
with plt.style.context("seaborn-whitegrid"):
fig, ax = plt.subplots()
Se_data.plot(kind="barh", ax=ax, title="No Border")
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots()
Se_data.plot(kind="barh", ax=ax, title="With Border")
Run Code Online (Sandbox Code Playgroud)
回应以下答案:
Se_data = pd.Series(Counter(np.random.randint(0,10,100)))
with plt.style.context("seaborn-whitegrid"):
fig, ax = plt.subplots()
Se_data.plot(kind="barh", ax=ax, title="No Border")
ax.spines['bottom'].set_color('0.5')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color('0.5')
ax.spines['left'].set_color(None)
ax.patch.set_facecolor('0.1')
plt.grid(b=True, which='major', color='0.2', linestyle='-')
plt.grid(b=True, which='minor', color='0.2', linestyle='-')
ax.tick_params(axis='x', …Run Code Online (Sandbox Code Playgroud) 我在Rw /中看到了这个教程autoplot.他们绘制了负载和加载标签:
autoplot(prcomp(df), data = iris, colour = 'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
Run Code Online (Sandbox Code Playgroud)
https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html
我更喜欢Python 3w/matplotlib, scikit-learn, and pandas进行数据分析.但是,我不知道如何添加这些?
你怎么能用这些载体绘制matplotlib?
我一直在阅读使用sklearn在PCA中恢复explain_variance_ratio_的功能名称,但尚未弄清楚
这是我如何绘制它 Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
%matplotlib inline
np.random.seed(0)
# Iris …Run Code Online (Sandbox Code Playgroud) 我一直在努力构建设计一个非常特别的畅达环境python和R使用串扰rpy2.我想出的用于安装正确R软件包的方法如下:
# install_main_environment.sh
now=$(date +"%T")
echo "Start Time: $now"
## Create Main Environment (OSX-64)
conda create -n python3 python=3 --yes
source activate python3
## Jupyter
conda install jupyterlab --yes
## R
conda install -c r r --yes
conda install -c r r-essentials --yes
conda install -c r rstudio --yes
conda install gfortran_osx-64 --yes
conda install -c anaconda clangxx_osx-64 --yes
conda install rpy2 --yes
## Install Python Packages
source ./install_python_packages.sh
## Install R …Run Code Online (Sandbox Code Playgroud) 我有一个非常方便的高级pd.DataFrame保存功能,我想添加到pandas. 如何将这个方法添加到类中pd.DataFrame?
def to_file(df, path, sep="\t", compression="infer", pickled="infer", verbose=False, **args):
_ , ext = os.path.splitext(path)
# Serialization
if pickled == "infer":
if ext in {".pkl", ".pgz", ".pbz2"}:
pickled = True
else:
pickled = False
# Compression
if compression == "infer":
if pickled:
if ext == ".pkl":
compression = None
if ext == ".pgz":
compression = "gzip"
if ext == ".pbz2":
compression = "bz2"
else:
compression = None
if path.endswith(".gz"):
compression = "gzip"
if path.endswith(".bz2"): …Run Code Online (Sandbox Code Playgroud) 我正在尝试linestyle=":"在networkx边缘绘制轮廓()。我似乎无法弄清楚如何对matplotlib patch对象执行此操作? 现在有没有人如何操纵这些patch物体以在这些“边缘”上绘制轮廓?如果无法做到这一点,是否有人知道如何获取行数据以ax.plot(x,y,linestyle=":")单独使用?
import networkx as nx
import numpy as np
from collections import *
# Graph data
G = {'y1': OrderedDict([('y2', OrderedDict([('weight', 0.8688325076457851)])), (1, OrderedDict([('weight', 0.13116749235421485)]))]), 'y2': OrderedDict([('y3', OrderedDict([('weight', 0.29660515972204304)])), ('y4', OrderedDict([('weight', 0.703394840277957)]))]), 'y3': OrderedDict([(4, OrderedDict([('weight', 0.2858185316736193)])), ('y5', OrderedDict([('weight', 0.7141814683263807)]))]), 4: OrderedDict(), 'input': OrderedDict([('y1', OrderedDict([('weight', 1.0)]))]), 'y4': OrderedDict([(3, OrderedDict([('weight', 0.27847763084646443)])), (5, OrderedDict([('weight', 0.7215223691535356)]))]), 3: OrderedDict(), 5: OrderedDict(), 'y5': OrderedDict([(6, OrderedDict([('weight', 0.5733512797415756)])), (2, OrderedDict([('weight', 0.4266487202584244)]))]), 6: OrderedDict(), 1: OrderedDict(), …Run Code Online (Sandbox Code Playgroud) 我正在对冗余的完整矩阵进行一些计算(即可以是三角矩阵而不会丢失信息).我意识到我只能计算三角形的下半部分以获得更快的结果.一旦完成,我怎样才能将下三角形投射到鞋面上?
换句话说,我该如何扭转这种np.tril方法呢?
print DF_var.as_matrix()
# [[1 1 0 1 1 1 0 1 0 0 0]
# [1 1 1 1 1 0 1 0 1 1 1]
# [0 1 1 0 0 0 0 0 0 0 0]
# [1 1 0 1 0 0 0 0 0 0 0]
# [1 1 0 0 1 0 0 0 0 0 0]
# [1 0 0 0 0 1 1 0 0 0 0]
# [0 …Run Code Online (Sandbox Code Playgroud) 我有一个散点图,我想根据另一个值(np.random.random()在这种情况下天真地分配)给它着色.
有没有办法seaborn用于将每个点的连续值(与绘制的数据没有直接关联)映射到沿着连续梯度的值seaborn?
这是我生成数据的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
%matplotlib inline
np.random.seed(0)
# Iris dataset
DF_data = pd.DataFrame(load_iris().data,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
columns = load_iris().feature_names)
Se_targets = pd.Series(load_iris().target,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
name = "Species")
# Scaling mean …Run Code Online (Sandbox Code Playgroud) 是否有任何特征选择方法Scikit-Learn(或一般算法)给出属性能力/预测能力/重要性的权重来预测特定目标? 例如,对from sklearn.datasets import load_iris4个属性权重中的每一个进行排序,以分别预测3种虹膜种类,但对于更复杂的数据集w/~1k-10k属性.
我寻找的东西类似于feature_importances_从RandomForestClassifier.但是,RandomForestClassifer为整个预测过程的每个属性赋予权重.权重不需要加起来,但我想找到一种方法将特定属性子集与特定目标相关联.
首先,我尝试"过度拟合"模型以丰富特定目标,但结果似乎在目标之间没有太大变化.其次,我尝试通过找出哪些属性具有最大变化但不直接转化为预测能力来进行排序路线.第三,我尝试了稀疏模型,但遇到了与使用相同的问题feature_importances_.
指向完成此操作的示例或教程的链接就足够了.可能是关于如何在随机森林中遍历决策树并存储可预测特定目标的节点的教程.
python classification machine-learning feature-selection scikit-learn
现在我正在进行一次非常激进的网格搜索.我有n=135 samples,我正在23 folds使用自定义交叉验证列车/测试列表.我有我的verbose=2.
以下是我的内容:
param_test = {"loss":["deviance"],
'learning_rate':[0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2],
"min_samples_split": np.linspace(0.1, 0.5, 12),
"min_samples_leaf": np.linspace(0.1, 0.5, 12),
"max_depth":[3,5,8],
"max_features":["log2","sqrt"],
"min_impurity_split":[5e-6, 1e-7, 5e-7],
"criterion": ["friedman_mse", "mae"],
"subsample":[0.5, 0.618, 0.8, 0.85, 0.9, 0.95, 1.0],
"n_estimators":[10]}
Mod_gsearch = GridSearchCV(estimator = GradientBoostingClassifier(),
param_grid = param_test, scoring="accuracy",n_jobs=32, iid=False, cv=cv_indices, verbose=2)
Run Code Online (Sandbox Code Playgroud)
我看了一下详细输出stdout:
$head gridsearch.o8475533
Fitting 23 folds for each of 254016 candidates, totalling 5842368 fits
Run Code Online (Sandbox Code Playgroud)
基于此,看起来5842368使用我的网格参数进行交叉验证对的排列.
$ grep -c "[CV]" gridsearch.o8475533 …Run Code Online (Sandbox Code Playgroud) 我从我当前环境缺少的包的 yml 文件中获取了这些。我如何在我当前的环境中安装这些?
channels:
- defaults
dependencies:
- appdirs=1.4.3=py36h28b3542_0
- asn1crypto=0.24.0=py36_0
- attrs=18.2.0=py36h28b3542_0
- blas=1.0=mkl
- cffi=1.11.5=py36h6174b99_1
- constantly=15.1.0=py36h28b3542_0
- cryptography=2.3.1=py36hdbc3d79_0
- freetype=2.9.1=hb4e5f40_0
- html5lib=1.0.1=py36_0
- idna=2.7=py36_0
- incremental=17.5.0=py36_0
- intel-openmp=2019.0=118
- libgfortran=3.0.1=h93005f0_2
- libxml2=2.9.4=0
- libxslt=1.1.29=hc208041_6
- lxml=4.1.1=py36h6c891f4_0
- mkl=2019.0=118
- mkl_fft=1.0.6=py36hb8a8100_0
- mkl_random=1.0.1=py36h5d10147_1
- numpy=1.15.3=py36h6a91979_0
- numpy-base=1.15.3=py36h8a80b8c_0
- pyasn1=0.4.4=py36h28b3542_0
- pyasn1-modules=0.2.2=py36_0
- pycparser=2.19=py36_0
- pyopenssl=18.0.0=py36_0
- service_identity=17.0.0=py36h28b3542_0
- twisted=17.5.0=py36_0
- zope=1.0=py36_1
- zope.interface=4.5.0=py36h1de35cc_0
- pip:
- absl-py==0.2.2
- ete3==3.1.1
- grpcio==1.12.1
Run Code Online (Sandbox Code Playgroud) python ×8
matplotlib ×3
plot ×3
scikit-learn ×3
anaconda ×2
conda ×2
aesthetics ×1
arrays ×1
biplot ×1
border ×1
clang ×1
class ×1
colors ×1
fortran ×1
gradient ×1
grid-search ×1
installation ×1
line ×1
matrix ×1
methods ×1
networkx ×1
numpy ×1
object ×1
package ×1
pandas ×1
parameters ×1
pca ×1
r ×1
seaborn ×1