标签: mlops

我使用带有 mlflow 的 ML 软件生成的 Python 代码来读取数据帧、执行一些表操作并输出数据帧。我能够成功运行代码并将新数据帧保存为工件。但是，我无法使用 log_model 记录模型，因为它不是我们训练和拟合的 LR 或分类器模型。我想为此记录一个模型，以便可以为其提供新数据并使用 REST API 进行部署

df = pd.read_csv(r"/home/xxxx.csv")


with mlflow.start_run():

    def getPrediction(row):
        perform_some_python_operations 
        return [Status_prediction, Status_0_probability, Status_1_probability]

    columnValues = []
    for column in columns:
        columnValues.append([])

    for index, row in df.iterrows():
        results = getPrediction(row)
        for n in range(len(results)):
            columnValues[n].append(results[n])

    for n in range(len(columns)):
        df[columns[n]] = columnValues[n]

    df.to_csv('dataset_statistics.csv')
    mlflow.log_artifact('dataset_statistics.csv')

Run Code Online (Sandbox Code Playgroud)

python deployment mlflow mlops

Sub*_*iri

2022 09-29

4
推荐指数

1
解决办法

5728
查看次数

如何连接到具有身份验证的 MLFlow 跟踪服务器？

我想连接到需要身份验证的远程跟踪服务器（ http://123.456.78.90 ）

当我这样做时：

import mlflow
mlflow.set_tracking_uri("http://123.456.78.90")
mlflow.set_experiment("my-experiment")

Run Code Online (Sandbox Code Playgroud)

我收到一个错误

MlflowException：对端点 /api/2.0/mlflow/experiments/list 的 API 请求失败，错误代码为 401！= 200。响应正文：401 需要授权

我知道我需要先登录，但我不知道该怎么做

authorization tracking mlflow mlops

Kse*_*ova

2023 11-11

4
推荐指数

1
解决办法

9163
查看次数

在 Google Colab 上设置 MLflow

我经常使用 Google Colab 来训练 TF/PyTorch 模型，因为 Colab 为我提供了 GPU/TPU 运行时。此外，我喜欢使用 MLflow 来存储和比较经过训练的模型、跟踪进度、共享等。将 MLflow 与 Google Colab 一起使用有哪些可用的解决方案？

google-colaboratory mlflow mlops

SvG*_*vGA

2021 05-25

3
推荐指数

4
解决办法

2621
查看次数

`mlflow server` - `--default-artifact-root` 和 `--artifacts-destination` 之间的区别

我正在用来mlflow server设置 mlflow 跟踪服务器。mlflow server有 2 个接受工件 URI 的命令选项，--default-artifact-root <URI>以及--artifacts-destination <URI>.

根据我的理解，--artifacts-destination当跟踪服务器为工件提供服务时使用。

基于MLflow Tracking文档提供的场景4和5

mlflow server --backend-store-uri postgresql://user:password@postgres:5432/mlflowdb --default-artifact-root s3://bucket_name --host remote_host --no-serve-artifacts\n

Run Code Online (Sandbox Code Playgroud)\n

mlflow server \\\n  --backend-store-uri postgresql://user:password@postgres:5432/mlflowdb \\\n  # Artifact access is enabled through the proxy URI \'mlflow-artifacts:/\',\n  # giving users access to this location without having to manage credentials\n  # or permissions.\n  --artifacts-destination s3://bucket_name \\\n  --host remote_host\n

Run Code Online (Sandbox Code Playgroud)\n

在这 2 个场景中，和--default-artifact-root …

mlflow mlops

wav*_*ide

2023 01-10

3
推荐指数

1
解决办法

3512
查看次数

如何增加 GKE 中 DASK 的调度程序内存

我在 GCP 上部署了一个 kubernetes 集群，结合了 prefect 和 dask。这些作业在正常情况下运行良好，但无法扩展到 2 倍的数据。到目前为止，我已将范围缩小到调度程序因内存使用率过高而关闭。 Dask 调度程序内存一旦内存使用量达到 2GB，作业就会失败并出现“未检测到心跳”错误。

有一个单独的构建 python 文件可用，我们可以在其中设置工作内存和 cpu。有一个 dask-gateway 软件包，我们可以在其中获取网关选项并设置工作内存。

options.worker_memory = 32
options.worker_cores = 10
cluster = gateway.new_cluster(options)
cluster.adapt(minimum=4, maximum=20)

Run Code Online (Sandbox Code Playgroud)

我无法弄清楚在哪里以及如何增加 dask-scheduler 的内存分配。

Specs:
Cluster Version: 1.19.14-gke.1900
Machine type - n1-highmem-64
Autoscaling set to 6 - 1000 nodes per zone
all nodes are allocated 63.77 CPU and 423.26 GB

Run Code Online (Sandbox Code Playgroud)

python google-kubernetes-engine dask mlops prefect

作者

lucky-day

0
推荐指数

1
解决办法

364
查看次数