我已经安装了HTTPS来终止我的AWS ELB上的外部HTTPS连接.我现在正尝试使用带有自签名证书的HTTPS来保护我在EL2和我的后端NGINX服务器之间的连接.我已按照文档进行操作,但通过HTTPS访问服务器会导致408 HTTP超时.我似乎无法获得任何调试信息来确定失败的地方.
有什么方法可以获得任何其他诊断信息来测试这个吗?
这是我的ELB配置:
$ aws elb describe-load-balancers --load-balancer-name <MY-ELB-NAME>
{
"LoadBalancerDescriptions": [
{
"Subnets": [
"<REDACTED>",
"<REDACTED>",
"<REDACTED>"
],
"CanonicalHostedZoneNameID": "<REDACTED>",
"VPCId": "<REDACTED>",
"ListenerDescriptions": [
{
"Listener": {
"InstancePort": 80,
"LoadBalancerPort": 80,
"Protocol": "HTTP",
"InstanceProtocol": "HTTP"
},
"PolicyNames": []
},
{
"Listener": {
"InstancePort": 443,
"SSLCertificateId": "<REDACTED>",
"LoadBalancerPort": 443,
"Protocol": "HTTPS",
"InstanceProtocol": "HTTPS"
},
"PolicyNames": [
"ELBSecurityPolicy-2015-05"
]
}
],
"HealthCheck": {
"HealthyThreshold": 2,
"Interval": 30,
"Target": …
Run Code Online (Sandbox Code Playgroud) 有没有办法在使用代理服务器的企业网络上使用ez_setup.py安装Python的easy_install?目前,我收到连接超时:
Downloading http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg
Traceback (most recent call last):
File "C:\jsears\python\ez_setup.py", line 278, in <module>
main(sys.argv[1:])
File "C:\jsears\python\ez_setup.py", line 210, in main
egg = download_setuptools(version, delay=0)
File "C:\jsears\python\ez_setup.py", line 158, in download_setuptools
src = urllib2.urlopen(url)
File "C:\jsears\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\jsears\Python27\lib\urllib2.py", line 400, in open
response = self._open(req, data)
File "C:\jsears\Python27\lib\urllib2.py", line 418, in _open
'_open', req)
File "C:\jsears\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\jsears\Python27\lib\urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req) …
Run Code Online (Sandbox Code Playgroud) Apache Spark pyspark.RDD
API文档提到groupByKey()
效率低下.相反,它是推荐使用reduceByKey()
,aggregateByKey()
,combineByKey()
,或foldByKey()
代替.这将导致在shuffle之前在worker中进行一些聚合,从而减少跨工作人员的数据混乱.
给定以下数据集和groupByKey()
表达式,什么是等效且有效的实现(减少的跨工作者数据混洗),它不使用groupByKey()
,但提供相同的结果?
dataset = [("a", 7), ("b", 3), ("a", 8)]
rdd = (sc.parallelize(dataset)
.groupByKey())
print sorted(rdd.mapValues(list).collect())
Run Code Online (Sandbox Code Playgroud)
输出:
[('a', [7, 8]), ('b', [3])]
Run Code Online (Sandbox Code Playgroud) amazon-ec2 ×1
amazon-elb ×1
apache-spark ×1
easy-install ×1
https ×1
nginx ×1
proxy ×1
pyspark ×1
python ×1
rdd ×1