我正在尝试使用以下内容将数据帧写入python pandas中的gzip压缩包:
import pandas as pd
import datetime
import csv
import gzip
# Get data (with previous connection and script variables)
df = pd.read_sql_query(script, conn)
# Create today's date, to append to file
todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d'))
print todaysdatestring
# Create csv with gzip compression
df.to_csv('foo-%s.csv.gz' % todaysdatestring,
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
compression='gzip',
quotechar='"',
doublequote=True,
line_terminator='\n')
Run Code Online (Sandbox Code Playgroud)
这只是创建了一个名为'foo-YYYYMMDD.csv.gz'的csv,而不是一个真正的gzip存档.
我也试过添加这个:
#Turn to_csv statement into a variable
d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
compression='gzip',
quotechar='"',
doublequote=True,
line_terminator='\n')
# Write above variable …Run Code Online (Sandbox Code Playgroud) 我在AWS上有一个现有的Elastic Beanstalk烧瓶应用程序偶尔不会初始化并给出以下错误:
[Mon Jan 23 10:06:51.550205 2017] [core:error] [pid 7331] [client 127.0.0.1:43790] script timed out before returning headers: application.py
[Mon Jan 23 10:10:43.910014 2017] [core:error] [pid 7329] [client 127.0.0.1:43782] End of script output before headers: application.py
Run Code Online (Sandbox Code Playgroud)
任何想法为什么会这样?最近我将项目requirements.txt改为包括在内pandas==0.19.2.在更改之前,程序将在返回相同错误之前工作几天.更多日志/计划详情:
[Mon Jan 23 10:05:36.877664 2017] [suexec:notice] [pid 7323] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Mon Jan 23 10:05:36.886151 2017] [so:warn] [pid 7323] AH01574: module wsgi_module is already loaded, skipping
AH00557: httpd: apr_sockaddr_info_get() failed for ip-10-55-254-33
AH00558: httpd: Could not …Run Code Online (Sandbox Code Playgroud) 我必须从几个不同的数据库引擎中提取数据.导出此数据后,我将数据发送到AWS S3并使用COPY命令将该数据复制到Redshift.某些表包含大量文本,列字段中包含换行符和其他字符.当我运行以下代码时:
cursor.execute('''SELECT * FROM some_schema.some_message_log''')
rows = cursor.fetchall()
with open('data.csv', 'w', newline='') as fp:
a = csv.writer(fp, delimiter='|', quoting=csv.QUOTE_ALL, quotechar='"', doublequote=True, lineterminator='\n')
a.writerows(rows)
Run Code Online (Sandbox Code Playgroud)
某些具有回车符/换行符的列将创建新行:
"2017-01-05 17:06:32.802700"|"SampleJob"|""|"Date"|"error"|"Job.py"|"syntax error at or near ""from"" LINE 34: select *, SYSDATE, from staging_tops.tkabsences;
^
-<class 'psycopg2.ProgrammingError'>"
Run Code Online (Sandbox Code Playgroud)
这导致导入过程失败.我可以通过对异常进行硬编码来解决这个问题:
cursor.execute('''SELECT * FROM some_schema.some_message_log''')
rows = cursor.fetchall()
with open('data.csv', 'w', newline='') as fp:
a = csv.writer(fp, delimiter='|', quoting=csv.QUOTE_ALL, quotechar='"', doublequote=True, lineterminator='\n')
for row in rows:
list_of_rows = []
for c in row:
if isinstance(c, str):
c = …Run Code Online (Sandbox Code Playgroud) 我有一个需要使用 Pandas、sqlalchemy 和 cx_Oracle 的 Lambda 函数。
将所有这些库安装和打包在一起超过了AWS Lambda的250MB 部署包限制。
我只想包含Oracle Basic Light Package的 .zip 文件,然后在运行时提取并使用它。
我试过的
我的项目结构如下:
cx_Oracle-7.2.3.dist-info/
dateutil/
numpy/
pandas/
pytz/six-1.12.0.dist-info/
sqlalchemy/
SQLAlchemy-1.3.8.egg-info/
cx_Oracle.cpython-36m-x86_64-linux-hnu.so
instantclient-basiclite-linux.x64-19.3.0.0.0dbru.zip
main.py
six.py
template.yml
Run Code Online (Sandbox Code Playgroud)
在 中main.py,我运行以下命令:
import json, traceback, os
import sqlalchemy as sa
import pandas as pd
def main(event, context):
try:
unzip_oracle()
return {'statusCode': 200,
'body': json.dumps(run_query()),
'headers': {'Content-Type': 'application/json', 'Access-Control-Allow-Origin':'*'}}
except:
em = traceback.format_exc()
print("Error encountered. Error is: \n" + str(em))
return {'statusCode': 500, …Run Code Online (Sandbox Code Playgroud) 我有一个深层嵌套的JSON,我正在尝试使用json_normalize将其转换为Pandas Dataframe。
一个通用的样品我用外表看起来像这样的工作(我已经添加了什么我想在文章底部做上下文)的JSON数据:
{
"per_page": 2,
"total": 1,
"data": [{
"total_time": 0,
"collection_mode": "default",
"href": "https://api.surveymonkey.com/v3/responses/5007154325",
"custom_variables": {
"custvar_1": "one",
"custvar_2": "two"
},
"custom_value": "custom identifier for the response",
"edit_url": "https://www.surveymonkey.com/r/",
"analyze_url": "https://www.surveymonkey.com/analyze/browse/",
"ip_address": "",
"pages": [
{
"id": "103332310",
"questions": [{
"answers": [{
"choice_id": "3057839051"
}
],
"id": "319352786"
}
]
},
{
"id": "44783164",
"questions": [{
"id": "153745381",
"answers": [{
"text": "some_name"
}
]
}
]
},
{
"id": "44783183",
"questions": [{
"id": …Run Code Online (Sandbox Code Playgroud)