Sagemaker 批量转换“ValueError:无法将字符串转换为浮点数”

Ton*_*ony 2 scikit-learn amazon-sagemaker

我正在使用 sagemaker 并使用批量转换来运行本地变压器。但是,转换似乎没有调用我的自定义代码。

\n\n

以下是SKlearn初始化

\n\n
from sagemaker.sklearn.estimator import SKLearn\nsource_dir = \'train\'\nscript_path = \'train.py\'\n\nsklearn = SKLearn(\n    entry_point=script_path,\n    train_instance_type="local_gpu",\n    source_dir=source_dir,\n    role=role,\n    sagemaker_session=sagemaker_session)\nsklearn.fit({\'train\': "file://test.csv"})\n
Run Code Online (Sandbox Code Playgroud)\n\n

train.py是一个Python脚本,用于加载训练数据,并将模型保存到S3

\n\n

批量变换为:

\n\n
transformer = sklearn.transformer(instance_count=1,\n                                  entry_point=source_dir+"/"+script_path,\n                                  instance_type=\'local_gpu\',\n                                  strategy=\'MultiRecord\',\n                                  assemble_with=\'Line\'\n                                  )\ntransformer.transform("file://test_messages", content_type=\'text/csv\', split_type=\'Line\')\nprint(\'Waiting for transform job: \' + transformer.latest_transform_job.job_name)\ntransformer.wait()\n
Run Code Online (Sandbox Code Playgroud)\n\n

file://test_messages包含一个 csv,它是字符串列表

\n\n

完整的错误是

\n\n
algo-1-6c5rl_1  | 172.18.0.1 - - [30/Jan/2020:14:14:30 +0000] "GET /ping HTTP/1.1" 200 0 "-" "-"\nalgo-1-6c5rl_1  | 172.18.0.1 - - [30/Jan/2020:14:14:30 +0000] "GET /execution-parameters HTTP/1.1" 404 232 "-" "-"\nalgo-1-6c5rl_1  | 2020-01-30 14:14:30,846 ERROR - train - Exception on /invocations [POST]\nalgo-1-6c5rl_1  | Traceback (most recent call last):\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper\nalgo-1-6c5rl_1  |     return fn(*args, **kwargs)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 56, in default_input_fn\nalgo-1-6c5rl_1  |     return np_array.astype(np.float32) if content_type in content_types.UTF8_TYPES else np_array\nalgo-1-6c5rl_1  | ValueError: could not convert string to float: \'IMPORTANT - You could be entitled up to \xef\xbf\xbd3,160 in compensation from mis-sold PPI on a credit card or loan. Please reply PPI for info or STOP to opt out.\'\nalgo-1-6c5rl_1  | \nalgo-1-6c5rl_1  | During handling of the above exception, another exception occurred:\nalgo-1-6c5rl_1  | \nalgo-1-6c5rl_1  | Traceback (most recent call last):\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app\nalgo-1-6c5rl_1  |     response = self.full_dispatch_request()\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request\nalgo-1-6c5rl_1  |     rv = self.handle_user_exception(e)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception\nalgo-1-6c5rl_1  |     reraise(exc_type, exc_value, tb)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise\nalgo-1-6c5rl_1  |     raise value\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request\nalgo-1-6c5rl_1  |     rv = self.dispatch_request()\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request\nalgo-1-6c5rl_1  |     return self.view_functions[rule.endpoint](**req.view_args)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_transformer.py", line 200, in transform\nalgo-1-6c5rl_1  |     self._model, request.content, request.content_type, request.accept\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_transformer.py", line 227, in _default_transform_fn\nalgo-1-6c5rl_1  |     data = self._input_fn(content, content_type)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 95, in wrapper\nalgo-1-6c5rl_1  |     six.reraise(error_class, error_class(e), sys.exc_info()[2])\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/six.py", line 692, in reraise\nalgo-1-6c5rl_1  |     raise value.with_traceback(tb)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper\nalgo-1-6c5rl_1  |     return fn(*args, **kwargs)\nalgo-1-6c5rl_1  |   File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 56, in default_input_fn\nalgo-1-6c5rl_1  |     return np_array.astype(np.float32) if content_type in content_types.UTF8_TYPES else np_array\nalgo-1-6c5rl_1  | sagemaker_containers._errors.ClientError: could not convert string to float: \'IMPORTANT - You could be entitled up to \xef\xbf\xbd3,160 in compensation from mis-sold PPI on a credit card or loan. Please reply PPI for info or STOP to opt out.\'\nalgo-1-6c5rl_1  | 172.18.0.1 - - [30/Jan/2020:14:14:30 +0000] "POST /invocations HTTP/1.1" 500 290 "-" "-"\n.Waiting for transform job: sagemaker-scikit-learn-2020-01-30-14-14-30-490\n
Run Code Online (Sandbox Code Playgroud)\n\n

它似乎无法处理我的字符串。我在 train.py 中有代码来使用 TfidfVectorizer 转换字符串,但该代码没有被调用

\n

小智 5

我是 AWS SageMaker 的一名工程师。感谢您提供估算器/转换器设置的详细信息以及完整的错误日志。

查看具体错误,看起来 Scikit-learn 容器在default_input_fn. 值得庆幸的是,SageMaker Scikit-learn 是开源的,因此我们可以直接访问源代码sagemaker_sklearn_container/serving.py#L56来帮助了解它的工作原理。

容器选择执行“默认”输入函数来处理输入,然后再发送到模型。显然,默认实现不适用于您所需的输入格式。

与训练类似,您需要提供自定义 Python 代码来控制 SageMaker Scikit-learn 如何在服务/推理模式下处理模型。如果您想覆盖默认值,则需要input_fn在自定义 Python 代码中实现。您可以选择将其添加到train.py脚本中,或者在转换器中传递不同的 Python 文件。

该文档应该有助于编写input_fn: https: //sagemaker.readthedocs.io/en/stable/using_sklearn.html#process-input

如果您仍然遇到问题,您可以分享自定义代码中的示例。