如何使用 Java SDK 获取当前数据流作业的详细信息?

Vij*_*raj 3 google-cloud-dataflow

在 BlockingDataflowPipelineRunner 上完成作业后,我正在尝试获取当前数据流作业的详细信息,例如 ID、名称、类型、开始时间、结束时间等。类似于我们在数据流仪表板中看到的详细信息,

在此处输入图片说明

我已经使用下面的代码来获取状态,

    Pipeline p;
    ...
    ...
    PipelineResult result = p.run();

    switch (result.getState()) {
        case CANCELLED:
            break;
        case DONE:
            //MetadataTracker.insert(jobId, jobName, "Success", startTime, endTime);
        case FAILED:
            break;
        case RUNNING:
            break;
        case STOPPED:
            break;
        case UNKNOWN:
            break;
        case UPDATED:
            break;
        default:
            break;          
    }
Run Code Online (Sandbox Code Playgroud)

但是, PipelineResult 类没有任何方法来获取上述详细信息。谁能帮帮我吗?

jkf*_*kff 5

PipelineResult contains information about an Apache Beam pipeline that is common to all runners. To get Dataflow-specific information from the Dataflow service, you can use the low-level DataflowClient. You'll also need the jobId which is available from DataflowPipelineJob (Dataflow's implementation of PipelineResult):

PipelineResult res = pipeline.run();
String jobId = ((DataflowPipelineJob) res).getJobId();
DataflowClient client = DataflowClient.create(options);
Job job = client.getJob(jobId);
Run Code Online (Sandbox Code Playgroud)

Job contains all of the fields of interest. See https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs