我试图使用python-requests库抓取此页面
import requests
from lxml import etree,html
url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'
r = requests.get(url)
tree = etree.HTML(r.text)
print tree
Run Code Online (Sandbox Code Playgroud)
但我得到了上述错误.(TooManyRedirects)我试图使用allow_redirects参数但同样的错误
r = requests.get(url, allow_redirects=True)
我甚至试图发送标题和数据以及网址,但我不确定这是否是正确的方法.
headers = {'content-type': 'text/html'}
payload = {'ie':'UTF8','node':'976419031'}
r = requests.post(url,data=payload,headers=headers,allow_redirects=True)
Run Code Online (Sandbox Code Playgroud)
如何解决此错误.出于好奇,我甚至尝试过美丽的汤,但我得到了不同但同样的错误
page = BeautifulSoup(urllib2.urlopen(url))
urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Permanently
Run Code Online (Sandbox Code Playgroud) 如果只是点击api,每个文档中有5个字段.但我只想要这两个字段(user_id和loc_code)所以我在字段列表中提到过.但它仍会返回一些不必要的数据,如_shards,hits,time_out等.
使用以下查询在chrome中的postman插件中发出POST请求
<:9200>/myindex/mytype/_search
{
"fields" : ["user_id", "loc_code"],
"query":{"term":{"group_id":"1sd323s"}}
}
Run Code Online (Sandbox Code Playgroud)
//输出
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 323,
"max_score": 8.402096,
"hits": [
{
"_index": "myindex",
"_type": "mytype",
"_id": "<someid>",
"_score": 8.402096,
"fields": {
"user_id": [
"<someuserid>"
],
"loc_code": [
768
]
}
},
...
]
}
}
Run Code Online (Sandbox Code Playgroud)
但我只想要文件字段(两个提到的字段)我想要_id,_index,_type.有没有办法这样做
我刚刚开始学习Falcon(http://falcon.readthedocs.org/en/latest/user/quickstart.html),但它需要运行一个Web服务器,并且文档建议使用uwsgi或gunicorn.
虽然他们已经提到如何使用它与gunicorn
$ pip install gunicorn #install
$ gunicorn things:app #and run app through gunicorn.
Run Code Online (Sandbox Code Playgroud)
但我想用uwsgi运行这个示例应用程序.但我不知道怎么做.
我pip install uwsgi也gevent按照http://falcon.readthedocs.org/en/latest/user/install.html的建议安装了它
但现在呢.有人指导我.
设置.py
BASE_DIR = os.path.dirname(os.path.dirname(__file__))
DEBUG = True
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'adminstatic')
Run Code Online (Sandbox Code Playgroud)
mynginx.conf /etc/nginx/sites-enabled/mynginx.conf
你可以确定我在启用站点的文件夹中只有这个 conf 文件
server {
listen *:80;
server_name _;
access_log /var/log/myapp.access.log;
error_log /var/log/myapp.error.log;
# Django media
location /static/ {
alias /root/proj/myapp/adminstatic/; #your Django project's static files - amend as required
}
# Finally, send all non-media requests to the Django server.
location / {
uwsgi_pass unix:/tmp/myapp.sock;
include /etc/nginx/uwsgi_params; # the uwsgi_params file you installed
}
}
Run Code Online (Sandbox Code Playgroud)
在此之后,我尝试了 collectstatic 命令,
python manage.py collectstatic
它成功地收集了 adminstatic …
In [1]: a = [4,5,6]
In [2]: reduce(lambda x,y:x,a)
Out[2]: 4
In [3]: reduce(lambda x,y:x+1,a)
Out[3]: 6
In [4]: reduce(lambda x,y:x+2,a)
Out[4]: 8
In [5]: reduce(lambda x,y:x+3,a)
Out[5]: 10
Run Code Online (Sandbox Code Playgroud)
我理解首先减少操作,但我对其余3行感到困惑.对于第二减少reduce(lambda x,y:x+1,a)输出应该是5,不应该吗?我已经阅读了文档https://docs.python.org/2/library/functions.html#reduce,但没有得到它.
In [6]: reduce(lambda x,y:x+y,a)
Out[6]: 15
Run Code Online (Sandbox Code Playgroud)
这没关系!毫无疑问.