小编Apr*_*ril的帖子

如何从spark连接到远程配置单元服务器

我正在本地运行spark并希望访问位于远程Hadoop集群中的Hive表.

我可以通过SPARK_HOME下的直线访问蜂巢表

[ml@master spark-2.0.0]$./bin/beeline 
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://remote_hive:10000
Connecting to jdbc:hive2://remote_hive:10000
Enter username for jdbc:hive2://remote_hive:10000: root
Enter password for jdbc:hive2://remote_hive:10000: ******
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ml/spark/spark-2.0.0/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/10/12 19:06:39 INFO jdbc.Utils: Supplied authorities: remote_hive:10000
16/10/12 19:06:39 INFO jdbc.Utils: Resolved authority: remote_hive:10000
16/10/12 19:06:39 INFO jdbc.HiveConnection: Will try to open client transport …
Run Code Online (Sandbox Code Playgroud)

hive apache-spark apache-spark-sql spark-thriftserver

14
推荐指数
1
解决办法
4万
查看次数

使用Scrapy时如何防止twisted.internet.error.ConnectionLost错误?

我正在抓一些页面scrapy并得到以下错误:

twisted.internet.error.ConnectionLost

我的命令行输出:

2015-05-04 18:40:32+0800 [cnproxy] INFO: Spider opened
2015-05-04 18:40:32+0800 [cnproxy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-05-04 18:40:32+0800 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-05-04 18:40:32+0800 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2015-05-04 18:40:32+0800 [cnproxy] DEBUG: Retrying <GET http://www.cnproxy.com/proxy1.html> (failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionLost'>>]
2015-05-04 18:40:32+0800 [cnproxy] DEBUG: Retrying <GET http://www.cnproxy.com/proxy1.html> (failed 2 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionLost'>>]
2015-05-04 18:40:32+0800 [cnproxy] DEBUG: Gave up retrying <GET http://www.cnproxy.com/proxy1.html> …
Run Code Online (Sandbox Code Playgroud)

twisted scrapy web-scraping scrapy-spider

12
推荐指数
1
解决办法
5520
查看次数

SparkException:要汇编的值不能为null

我想StandardScaler用来规范化功能.

这是我的代码:

val Array(trainingData, testData) = dataset.randomSplit(Array(0.7,0.3))
val vectorAssembler = new VectorAssembler().setInputCols(inputCols).setOutputCol("features").transform(trainingData)   
val stdscaler = new StandardScaler().setInputCol("features").setOutputCol("scaledFeatures").setWithStd(true).setWithMean(false).fit(vectorAssembler)
Run Code Online (Sandbox Code Playgroud)

但是当我试图使用时,它抛出了异常 StandardScaler

[Stage 151:==>                                                    (9 + 2) / 200]16/12/28 20:13:57 WARN scheduler.TaskSetManager: Lost task 31.0 in stage 151.0 (TID 8922, slave1.hadoop.ml): org.apache.spark.SparkException: Values to assemble cannot be null.
    at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:159)
    at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:142)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
    at org.apache.spark.ml.feature.VectorAssembler$.assemble(VectorAssembler.scala:142)
    at org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:98)
    at org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:97)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql apache-spark-ml

10
推荐指数
1
解决办法
9644
查看次数