Phi*_*ien 9 hadoop apache-spark
我在OS X上安装了Spark和Hadoop.我成功地完成了Hadoop在本地运行的示例,将文件存储在HDFS中并运行了spark
spark-shell --master yarn-client
从shell中使用HDFS.但是,我遇到了问题,试图让Spark在没有HDFS的情况下运行,就在我的机器上本地运行.我查看了这个答案,但是当Spark文档说明时,它并不适合使用环境变量
在一台机器上本地运行很容易 - 您只需要在系统PATH上安装Java,或者指向Java安装的JAVA_HOME环境变量.
如果我运行基本SparkPi示例,我会得到正确的输出.
如果我尝试运行他们提供的示例Java应用程序,再次,我得到输出,但这次连接拒绝错误与端口9000有关,这听起来像是在尝试连接到Hadoop,但我不知道为什么因为我是没有具体说明
    $SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] ~/study/scala/sampleJavaApp/target/simple-project-1.0.jar
    Exception in thread "main" java.net.ConnectException: Call From 37-2-37-10.tssg.org/10.37.2.37 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
...
...
...
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
        at org.apache.hadoop.ipc.Client$Connection.access(Client.java:367)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
        at org.apache.hadoop.ipc.Client.call(Client.java:1381)
        ... 51 more
    15/07/31 11:05:06 INFO spark.SparkContext: Invoking stop() from shutdown hook
    15/07/31 11:05:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
...
...
...
    15/07/31 11:05:06 INFO ui.SparkUI: Stopped Spark web UI at http://10.37.2.37:4040
    15/07/31 11:05:06 INFO scheduler.DAGScheduler: Stopping DAGScheduler
    15/07/31 11:05:06 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    15/07/31 11:05:06 INFO util.Utils: path = /private/var/folders/cg/vkq1ghks37lbflpdg0grq7f80000gn/T/spark-c6ba18f5-17a5-4da9-864c-509ec855cadf/blockmgr-b66cc31e-7371-472f-9886-4cd33d5ba4b1, already present as root for deletion.
    15/07/31 11:05:06 INFO storage.MemoryStore: MemoryStore cleared
    15/07/31 11:05:06 INFO storage.BlockManager: BlockManager stopped
    15/07/31 11:05:06 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
    15/07/31 11:05:06 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    15/07/31 11:05:06 INFO spark.SparkContext: Successfully stopped SparkContext
    15/07/31 11:05:06 INFO util.Utils: Shutdown hook called
    15/07/31 11:05:06 INFO util.Utils: Deleting directory /private/var/folders/cg/vkq1ghks37lbflpdg0grq7f80000gn/T/spark-c6ba18f5-17a5-4da9-864c-509ec855cadf
关于我出错的地方的任何指示/解释都将不胜感激!
似乎我有环境变量HADOOP_CONF_DIR集的事实导致了一些问题.在该目录下,我有core-site.xml包含以下内容
<property>
     <name>fs.default.name</name>                                     
     <value>hdfs://localhost:9000</value>                             
</property> 
如果我更改了值,<value>hdfs://localhost:9100</value>那么当我尝试运行spark作业时,连接拒绝错误指的是此更改的端口
Exception in thread "main" java.net.ConnectException: Call From 37-2-37-10.tssg.org/10.37.2.37 to localhost:9100 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused 
因此,出于某种原因,尽管指示它在本地运行,但它正在尝试连接到HDFS.如果我删除HADOOP_CONF_DIR环境变量,工作正常.
Dan*_*bos 12
Apache Spark在您使用时使用Hadoop客户端库进行文件访问sc.textFile.这使得例如可以使用hdfs://或s3n://路径.您也可以使用本地路径file:/home/robocode/foo.txt.
如果指定不带架构的文件名,fs.default.name则使用.它默认为file:,但您明确地将其覆盖到hdfs://localhost:9000您的core-site.xml.因此,如果您没有指定架构,它会尝试从HDFS读取.
最简单的解决方案是指定架构:
JavaRDD<String> logData = sc.textFile("file:/home/robocode/foo.txt").cache();
Ger*_*án 6
我有同样的错误,HADOOP_CONF_DIR 已定义,所以我只是取消设置环境变量。
unset HADOOP_CONF_DIR
| 归档时间: | 
 | 
| 查看次数: | 10941 次 | 
| 最近记录: |