你如何“永久”删除 Mlflow 中的一个实验?

Ril*_*Hun 13 python mlflow

任何地方都没有记录永久删除实验。我正在使用带有后端 postgres db 的 Mlflow

这是我运行的:

client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)
Run Code Online (Sandbox Code Playgroud)

这将删除实验,但是当我运行一个与我刚刚删除的实验同名的新实验时,它将返回此错误:

mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the  experiment to create a new one.
Run Code Online (Sandbox Code Playgroud)

我在文档中找不到任何显示如何永久删除所有内容的地方。

Lee*_*ton 16

不幸的是,目前似乎无法通过 UI 或 CLI 执行此操作:-/

执行此操作的方法取决于您使用的后端文件存储的类型。

文件存储

如果您使用文件系统作为存储机制(默认),那么这很容易。“已删除”的实验被移动到一个.trash文件夹中。你只需要清除它:

rm -rf mlruns/.trash/*
Run Code Online (Sandbox Code Playgroud)

截至当前版本的文档(1.7.2),他们评论:

建议使用 cron 作业或备用工作流机制来清除.trash文件夹。

SQL 数据库:

这更棘手,因为需要删除依赖项。我正在使用 MySQL,这些命令对我有用:

USE mlflow_db;  # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
Run Code Online (Sandbox Code Playgroud)


小智 11

作为mlflow 1.11.0,实验中的推荐方式永久删除运行是:mlflow gc [OPTIONS]

从文档中,mlflow gc

从指定的后端存储永久删除已删除生命周期阶段中的运行。此命令删除与指定运行关联的所有工件和元数据。


小智 6

如果您想永久删除 MLFlow 的垃圾箱(如果您使用 PostgreSQL 作为后端存储),我将添加 SQL 命令。

更改到您的 MLFlow 数据库,例如使用:\c mlflow 然后:

DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs where experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
Run Code Online (Sandbox Code Playgroud)

不同之处在于,我在那里添加了“params”表 SQL 删除命令。