小编Cal*_*oon的帖子

SFTP Chroot 用户到挂载的 S3 存储桶

我正在尝试使用 Amazon EC2 服务器作为我的 SFTP 服务器，我可以在其中创建经过身份验证的用户以 sftp 进入我的服务器。我已经使用 s3fs 将 s3 存储桶安装到位于 /mnt/buckets/{username} 位置的服务器上。读取和写入 /mnt/buckets/{username} 目录可以按预期使用 s3。

我的 sshd_config 具有以下内容。

ChrootDirectory /mnt/buckets/%u
X11Forwarding no
AllowTcpForwarding no
ForceCommand internal-sftp

Run Code Online (Sandbox Code Playgroud)

当 SFTP-ing 我得到以下响应

...
debug1: Authentication succeeded (publickey).
Authenticated to ec2-54-173-113-164.compute-1.amazonaws.com ([54.173.113.164]:22).
debug2: fd 5 setting O_NONBLOCK
debug3: fd 6 is O_NONBLOCK
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
Write failed: Broken pipe
Connection closed

Run Code Online (Sandbox Code Playgroud)

挂载的存储桶具有这些权限。

/home/ubuntu# ls …

Run Code Online (Sandbox Code Playgroud)

sftp amazon-s3 amazon-ec2 amazon-web-services s3fs

Cal*_*oon

lucky-day

7
推荐指数

1
解决办法

1638
查看次数

如何将超出内存容量的数据从 PostgreSQL 查询流式传输到 parquet 文件？

我有下面的代码，它查询大约 500k 行的数据库。当它击中时，它会抛出一个 SIGKILL rows = cur.fetchall()。我尝试迭代游标而不是将其全部加载到行中，但它似乎仍然会导致 OOM 问题。

无论表的大小如何，如何从数据库中获取所有数据并将其安全地转换为 parquet 文件？

def get_parquet_for_dataset_id(self, dataset, lob, max_dt):
        query = _table_query(lob, table_name, max_dt)
        conn = self.conns[lob]

        with conn:
            with conn.cursor(cursor_factory=extras.RealDictCursor) as cur:
                cur.execute(query)

                rows = cur.fetchall()

                table = rows_to_table(rows)
                pq_bytes = io.BytesIO()
                pq.write_table(table, pq_bytes)
                _ = pq_bytes.seek(0)

                return pq_bytes;

Run Code Online (Sandbox Code Playgroud)

python psycopg2 parquet pyarrow

Cal*_*oon

2020 09-03

6
推荐指数

1
解决办法

2021
查看次数