RPC 的集合点以 (StatusCode.UNAVAILABLE,套接字已关闭)> 终止

Yao*_*jie 5 python grpc tensorflow tensorflow-serving grpc-python

  • 张量流GPU 1.10.0
  • 张量流服务器 1.10.0

我部署了一个为多个模型提供服务的张量流服务器。客户端代码是这样的client.py,我调用预测函数。

channel = implementations.insecure_channel(host, port)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
request = predict_pb2.PredictRequest()

def predict(data, shape, model_name, signature_name="predict"):
    request.model_spec.name = model_name
    request.model_spec.signature_name = signature_name
    request.inputs['image'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=shape))
    result = stub.Predict(request, 10.0)
    return result.outputs['prediction'].float_val[0]
Run Code Online (Sandbox Code Playgroud)

我有大约 100 个具有相同配置的客户端。这是调用该predict函数的示例代码:

from client import predict
while True:
    print(predict(data, shape, model_name))
    # time.sleep some while
Run Code Online (Sandbox Code Playgroud)

首先,当我运行客户端代码时,我可以正确收到响应。但几个小时后,客户端因错误而崩溃

_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Socket closed)
Run Code Online (Sandbox Code Playgroud)

我尝试将我的客户端代码修改为

def predict(data, shape, model_name, signature_name="predict"):
    channel = implementations.insecure_channel(host, port)
    stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = model_name
    request.model_spec.signature_name = signature_name
    request.inputs['image'].CopyFrom(tf.contrib.util.make_tensor_proto(data, shape=shape))
    result = stub.Predict(request, 10.0)
    return result.outputs['prediction'].float_val[0]
Run Code Online (Sandbox Code Playgroud)

predict这意味着每次调用该函数时我都会尝试与 tfs 服务器建立连接。但这段代码也像以前一样失败了。

那么面对这种情况我该怎么办呢?

Yao*_*jie 5

最后我channel.close()在之前添加了一个return,效果很好。