小编jon*_*jon的帖子

SageMaker Endpoint 卡在“正在创建”

我正在尝试部署 SageMaker 端点,但它无限期地陷入“创建”阶段。下面是我的 Dockerfile 和训练/服务脚本。该模型训练没有任何问题。只有端点部署会陷入“创建”阶段。

下面是文件夹结构

文件夹结构

|_code
   |_train_serve.py
|_Dockerfile
Run Code Online (Sandbox Code Playgroud)

下面是 Dockerfile

Dockerfile

# ##########################################################

# Adapt your container (to work with SageMaker)
# # https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html
# # https://hub.docker.com/r/huanjason/scikit-learn/dockerfile

ARG REGION=us-east-1

FROM python:3.7

RUN apt-get update && apt-get -y install gcc

RUN pip3 install \
        # numpy==1.16.2 \
        numpy \
        # scikit-learn==0.20.2 \
        scikit-learn \
        pandas \
        # scipy==1.2.1 \
        scipy \
        mlflow

RUN rm -rf /root/.cache

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

# Install sagemaker-training toolkit to enable SageMaker Python SDK …
Run Code Online (Sandbox Code Playgroud)

amazon-web-services data-science amazon-sagemaker

7
推荐指数
1
解决办法
3882
查看次数

如何避免 AWS Athena CTAS 查询创建小文件?

我无法弄清楚我的 CTAS 查询出了什么问题,即使我没有提到任何分桶列,它也会在存储在分区内时将数据分解成更小的文件。有没有办法避免这些小文件并将每个分区存储为一个文件,因为小于 128 MB 的文件会导致额外的开销?

CREATE TABLE sampledb.yellow_trip_data_parquet
WITH(
    format = 'PARQUET'
    parquet_compression = 'GZIP',
    external_location='s3://mybucket/Athena/tables/parquet/'
    partitioned_by=ARRAY['year','month']
)
AS SELECT
    VendorID,
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    passenger_count,
    trip_distance,
    RatecodeID,
    store_and_fwd_flag,
    PULocationID,
    DOLocationID,
    payment_type,
    fare_amount,
    extra,
    mta_tax,
    tip_amount,
    tolls_amount,
    improvement_surcharge,
    total_amount,
    date_format(date_parse(tpep_pickup_datetime,'%Y-%c-%d %k:%i:%s'),'%Y')  AS year,
    date_format(date_parse(tpep_pickup_datetime,'%Y-%c-%d %k:%i:%s'),'%c')  AS month
FROM sampleDB.yellow_trip_data_raw;
Run Code Online (Sandbox Code Playgroud)

来自我的分区的图像

amazon-web-services amazon-athena

4
推荐指数
2
解决办法
1843
查看次数

如何在 postgres 的文本列上添加唯一约束(忽略特殊字符)?

如何在 Postgres 的文本列上添加唯一约束(忽略特殊字符)?

CREATE TABLE my_table(
    SomeTextColumn citext
CONSTRAINT person_u_1 UNIQUE (SomeTextColumn)
);
Run Code Online (Sandbox Code Playgroud)

在上表中,我尝试添加一个唯一约束,该约束将通过忽略传入数据中的特殊字符来寻找唯一性

For example:
1. HelloWorld --> Gets inserted successfully
2. Hello World --> Should fail with duplicate constraint
2. Hello%$^&*W^%orld --> Should fail with duplicate constraint
Run Code Online (Sandbox Code Playgroud)

regex sql postgresql unique-constraint aws-serverless

2
推荐指数
1
解决办法
258
查看次数

并行处理比串行处理花费更长的时间并且跳过处理几个条目

为什么Python中的并行处理比串行处理慢?

#!/usr/bin/env python3
import os
import time
from functools import partial
import multiprocessing as mp

def print_hello(i, typ):
    print(typ + " : PID-" + str(os.getpid()) + "\n")
    # print("Hello World " + str(i))

if __name__ == '__main__':
    N= mp.cpu_count()
    parallel_start_time = time.time()
    with mp.Pool(processes = N) as p:
        p.map(partial(print_hello,typ="Parallel"), (x for x in range(100)))
    parallel_end_time = time.time()
    

    serial_start_time = time.time()
    for x in range(100):
        print_hello(x, "Serial")
    serial_end_time = time.time()
    
    print("Parallel processing took " + str(parallel_end_time - parallel_start_time) + " seconds") …
Run Code Online (Sandbox Code Playgroud)

python python-3.x

0
推荐指数
1
解决办法
238
查看次数