Ame*_*ina 6 python gdata pandas
当我尝试在pandas中加载Google电子表格时
from StringIO import StringIO
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=<some_long_code>&output=csv')
data = r.content
df = pd.read_csv(StringIO(data), index_col=0)
Run Code Online (Sandbox Code Playgroud)
我得到以下内容:
CParserError: Error tokenizing data. C error: Expected 1316 fields in line 73, saw 1386
Run Code Online (Sandbox Code Playgroud)
为什么?我认为可以使用数据识别电子表格的行和列集合,并分别使用电子表格行和列作为数据框索引和列(对于任何空的NaN).为什么会失败?
我的这个问题展示了如何将Google电子表格CSV变成A Pandas Dataframe
正如其中一位评论员所说,您没有要求以CSV格式提供数据,您在网址末尾有"编辑"请求您可以使用此代码并查看它在电子表格中的工作方式(顺便提一下,需要公开) ..)也可以做私人表,但这是另一个话题.
from StringIO import StringIO # got moved around in python3 if you're using that.
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=['Quradate'])
In [11]: df.head()
Out[11]:
City region Res_Comm \
0 Dothan South_Central-Montgomery-Auburn-Wiregrass-Dothan Residential
10 Foley South_Mobile-Baldwin Residential
12 Birmingham North_Central-Birmingham-Tuscaloosa-Anniston Commercial
38 Brent North_Central-Birmingham-Tuscaloosa-Anniston Residential
44 Athens North_Huntsville-Decatur-Florence Residential
mkt_type Quradate National_exp Alabama_exp Sales_exp \
0 Rural 2010-01-15 00:00:00 2 2 3
10 Suburban_Urban 2010-01-15 00:00:00 4 4 4
12 Suburban_Urban 2010-01-15 00:00:00 2 2 3
38 Rural 2010-01-15 00:00:00 3 3 3
44 Suburban_Urban 2010-01-15 00:00:00 4 5 4
Run Code Online (Sandbox Code Playgroud)
获取csv输出的新Google电子表格网址格式为
https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id
Run Code Online (Sandbox Code Playgroud)
那么他们现在需要稍微改变网址格式:
https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=0 #for the 1st sheet
Run Code Online (Sandbox Code Playgroud)
我还发现我需要做以下几点来处理Python 3对上面的一个小修改:
from io import StringIO
Run Code Online (Sandbox Code Playgroud)
并获取文件:
guid=0 #for the 1st sheet
act = requests.get('https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=%s' % guid)
dataact = act.content.decode('utf-8') #To convert to string for Stringio
actdf = pd.read_csv(StringIO(dataact),index_col=0,parse_dates=[0], thousands=',').sort()
Run Code Online (Sandbox Code Playgroud)
actdf现在是一个完整的pandas数据框,带有标题(列名)
归档时间: |
|
查看次数: |
4701 次 |
最近记录: |