mon*_*top 49 python csv curl python-2.x output
当我卷曲到API调用链接http://example.com/passkey=wedsmdjsjmdd时
curl 'http://example.com/passkey=wedsmdjsjmdd'
Run Code Online (Sandbox Code Playgroud)
我以csv文件格式获取员工输出数据,如:
"Steve","421","0","421","2","","","","","","","","","421","0","421","2"
Run Code Online (Sandbox Code Playgroud)
如何使用python解析这个.
我试过了:
import csv
cr = csv.reader(open('http://example.com/passkey=wedsmdjsjmdd',"rb"))
for row in cr:
print row
Run Code Online (Sandbox Code Playgroud)
但它没有用,我收到了一个错误
http://example.com/passkey=wedsmdjsjmdd No such file or directory:
谢谢!
Kat*_*mar 68
使用pandas直接从url读取csv文件非常简单
import pandas as pd
data = pd.read_csv('https://example.com/passkey=wedsmdjsjmdd')
Run Code Online (Sandbox Code Playgroud)
这将以表格格式读取您的数据,这将非常容易处理
ean*_*son 66
您需要替换open为urllib.urlopen或urllib2.urlopen.
例如
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
Run Code Online (Sandbox Code Playgroud)
这将输出以下内容
Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
...
Run Code Online (Sandbox Code Playgroud)
小智 19
您也可以使用请求模块执行此操作:
url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
Run Code Online (Sandbox Code Playgroud)
The*_*des 16
此问题已标记,python-2.x因此篡改原始问题或已接受的答案似乎不正确。但是,现在不支持 Python 2,并且这个问题对于“python csv urllib”仍然有很好的谷歌果汁,所以这里有一个更新的 Python 3 解决方案。
现在需要将urlopen的响应(以字节为单位)解码为有效的本地编码,因此必须稍微修改接受的答案:
import csv, urllib.request
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib.request.urlopen(url)
lines = [l.decode('utf-8') for l in response.readlines()]
cr = csv.reader(lines)
for row in cr:
print(row)
Run Code Online (Sandbox Code Playgroud)
请注意以 开头的额外行lines =,这urlopen是现在在urllib.request模块中的事实,print当然需要括号。
它几乎没有广告,但是csv.reader 可以从字符串列表中读取。
由于其他人提到了熊猫,这里有一个单行代码,用于在控制台友好的输出中显示 CSV:
python3 -c 'import pandas
df = pandas.read_csv("http://winterolympicsmedals.com/medals.csv")
print(df.to_string())'
Run Code Online (Sandbox Code Playgroud)
(是的,它是三行,但您可以将其复制粘贴为一个命令。;)
The*_*inn 15
要在下载大文件时提高性能,下面的工作可能会更有效:
import requests
from contextlib import closing
import csv
url = "http://download-and-process-csv-efficiently/python.csv"
with closing(requests.get(url, stream=True)) as r:
reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
for row in reader:
# Handle each row here...
print row
Run Code Online (Sandbox Code Playgroud)
通过stream=True在GET请求中设置,当我们传递r.iter_lines()给csv.reader()时,我们将生成器传递给csv.reader().通过这样做,我们启用csv.reader()来懒惰地遍历响应中的每一行for row in reader.
这避免了在我们开始处理之前将整个文件加载到内存中,从而大大减少了大文件的内存开销.
import pandas as pd
url='https://raw.githubusercontent.com/juliencohensolal/BankMarketing/master/rawData/bank-additional-full.csv'
data = pd.read_csv(url,sep=";") # use sep="," for coma separation.
data.describe()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
111259 次 |
| 最近记录: |