在Python中,给定文本文件的URL,读取文本文件内容的最简单方法是什么？

Question

在Python中,给定文本文件的URL,读取文本文件内容的最简单方法是什么？

在Python中,当给定文本文件的URL时,从文本文件中访问内容并在不保存文本文件的本地副本的情况下逐行本地打印文件内容的最简单方法是什么？

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

Run Code Online (Sandbox Code Playgroud)

Answer 1

e-s*_*tis 102

实际上最简单的方法是:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

Run Code Online (Sandbox Code Playgroud)

正如威尔建议的那样,你甚至不需要"阅读线".你甚至可以缩短它

import urllib2

for line in urllib2.urlopen(target_url):
    print line

Run Code Online (Sandbox Code Playgroud)

但请记住,在Python中,可读性至关重要.

但是,这是最简单的方法,但不是安全的方式,因为大多数时候使用网络编程,您不知道是否会尊重预期的数据量.因此,您通常会更好地读取固定且合理数量的数据,您知道这些数据足以满足您期望的数据,但会阻止您的脚本被淹没:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

Run Code Online (Sandbox Code Playgroud)

编辑09/2016:在python 3及更高版本中使用urllib.request而不是urllib2

Answer 2

And*_*Mao 32

我是Python的新手,在接受的解决方案中关于Python 3的随意评论令人困惑.对于后代,在Python 3中执行此操作的代码是

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

Run Code Online (Sandbox Code Playgroud)

或者

from urllib.request import urlopen
data = urlopen(target_url)

Run Code Online (Sandbox Code Playgroud)

请注意,只是import urllib不起作用.

Answer 3

Ken*_*der 23

实际上没有必要逐行阅读.你可以得到这样的全部事情:

import urllib
txt = urllib.urlopen(target_url).read()

Run Code Online (Sandbox Code Playgroud)

对于Python 3，它是： txt = urllib.request.urlopen(target_url).read() (3认同)
它不起作用:_AttributeError:模块'urllib'没有属性'urlopen'_ (2认同)

Answer 4

lea*_*eal 13

该请求库有一个简单的界面,并与两个Python 2和3的作品.

import requests

response = requests.get(target_url)
data = response.text

Run Code Online (Sandbox Code Playgroud)

Answer 5

Fab*_*ian 10

import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line

Run Code Online (Sandbox Code Playgroud)

Answer 6

del*_*ter 7

只需在这里更新@ken-kinder 建议的解决方案，让 Python 2 能够与 Python 3 配合使用：

import urllib
urllib.request.urlopen(target_url).read()

Run Code Online (Sandbox Code Playgroud)

Answer 7

Wil*_*ill 6

import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l

Run Code Online (Sandbox Code Playgroud)

+1,但请注意,这是最简单的方式,而不是最安全.如果服务器端出现任何错误,并且这个传递内容永远存在,那么最终可能会出现无限循环. (2认同)

Answer 8

bmi*_*lis 6

对我来说，上述反应都没有直接发挥作用。相反，我必须执行以下操作（Python 3）：

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

Run Code Online (Sandbox Code Playgroud)

Answer 9

aru*_*run 6

正如 @Andrew Mao 所建议的， requests包对于简单的 ui 来说非常有效

import requests
response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
data = response.text
for i, line in enumerate(data.split('\n')):
    print(f'{i}   {line}')

Run Code Online (Sandbox Code Playgroud)

输出：

0    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
1    prices and the demand for clean air', J. Environ. Economics & Management,
2    vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
3    ...', Wiley, 1980.   N.B. Various transformations are used in the table on
4    pages 244-261 of the latter.
5   
6    Variables in order:

Run Code Online (Sandbox Code Playgroud)

查看 Kaggle 笔记本，了解如何从 URL 中提取数据集/数据帧

Answer 10

lea*_*eal 5

Python 3 中的另一种方法是使用urllib3 包。

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

Run Code Online (Sandbox Code Playgroud)

这可能是比 urllib 更好的选择，因为 urllib3 拥有

线程安全。

连接池。

客户端 SSL/TLS 验证。

使用多部分编码上传文件。

重试请求和处理 HTTP 重定向的帮助程序。

支持 gzip 和 deflate 编码。

代理支持 HTTP 和 SOCKS。

100% 的测试覆盖率。

[requests](https://2.python-requests.org/en/master/) 库部分基于 urllib3。 (2认同)

Answer 11

xia*_*ang 5

我确实认为requests是最好的选择。另请注意手动设置编码的可能性。

import requests
response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
# response.encoding = "utf-8"
hehe = response.text

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，4 月前
查看次数：	154496 次
最近记录：	7 年，7 月前