如何将 Jupyter Ipython notebook 连接到 Amazon redshift

Spa*_*ity 7 amazon-redshift jupyter jupyter-notebook

我正在使用 Mac 优胜美地。我已经使用 conda install "package name" 安装了 postgresql、psycopg2 和 simplejson 包。安装后我已经导入了这些包。我试图用我的亚马逊红移凭证创建一个 json 文件

{
    "user_name": "YOUR USER NAME",
    "password": "YOUR PASSWORD",
    "host_name": "YOUR HOST NAME",
    "port_num": "5439",
    "db_name": "YOUR DATABASE NAME"
}
Run Code Online (Sandbox Code Playgroud)

我用过

open("Credentials.json") as fh:
    creds = simplejson.loads(fh.read())
Run Code Online (Sandbox Code Playgroud)

但这是抛出错误。这些是网站上给出的说明。我尝试搜索其他网站,但没有网站给出很好的解释。

请让我知道将 Jupyter 连接到 amazon redshift 的方法。

Joe*_*ris 6

RJMetrics 这里有一个很好的指南:“使用 Jupyter Notebook 和 AWS Redshift 设置您的分析堆栈”。它用ipython-sql

这很好用,并在网格中显示结果。

在 [1]:

import sqlalchemy
import psycopg2
import simplejson
%load_ext sql
%config SqlMagic.displaylimit = 10
Run Code Online (Sandbox Code Playgroud)

在 [2]:

with open("./my_db.creds") as fh:
    creds = simplejson.loads(fh.read())

connect_to_db = 'postgresql+psycopg2://' + \
                creds['user_name'] + ':' + creds['password'] + '@' + \
                creds['host_name'] + ':' + creds['port_num'] + '/' + creds['db_name'];
%sql $connect_to_db
Run Code Online (Sandbox Code Playgroud)

在 [3] 中:

% sql SELECT * FROM my_table LIMIT 25;
Run Code Online (Sandbox Code Playgroud)


小智 5

这是我的方法:

----INSERT IN CELL 1-----
import psycopg2
redshift_endpoint = "<add your endpoint>"
redshift_user = "<add your user>"
redshift_pass = "<add your password>"
port = <your port>
dbname = "<your db name>"

----INSERT IN CELL 2-----
from sqlalchemy import create_engine
from sqlalchemy import text
engine_string = "postgresql+psycopg2://%s:%s@%s:%d/%s" \
% (redshift_user, redshift_pass, redshift_endpoint, port, dbname)
engine = create_engine(engine_string)

----INSERT IN CELL 3 - THIS EXAMPLE WILL GET ALL TABLES FROM YOUR DATABASE-----
sql = """
select schemaname, tablename from pg_tables order by schemaname, tablename;
"""

----LOAD RESULTS AS TUPLES TO A LIST-----
tables = []
output = engine.execute(sql)
for row in output:
    tables.append(row)
tables

--IF YOU'RE USING PANDAS---
raw_data = pd.read_sql_query(text(sql), engine)
Run Code Online (Sandbox Code Playgroud)