vkr*_*ris 8 python streaming hadoop mapreduce
我正在尝试运行一个hadoop-streaming python作业.
bin/hadoop jar contrib/streaming/hadoop-0.20.1-streaming.jar
-D stream.non.zero.exit.is.failure=true
-input /ixml
-output /oxml
-mapper scripts/mapper.py
-file scripts/mapper.py
-inputreader "StreamXmlRecordReader,begin=channel,end=/channel"
-jobconf mapred.reduce.tasks=0
Run Code Online (Sandbox Code Playgroud)
我确保mapper.py具有所有权限.它出错了说
Caused by: java.io.IOException: Cannot run program "mapper.py":
error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
... 19 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:53)
at java.lang.ProcessImpl.start(ProcessImpl.java:91)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
Run Code Online (Sandbox Code Playgroud)
我尝试将mapper.py复制到hdfs并提供相同的hdfs://localhost/mapper.py链接,这也不起作用!有关如何修复此错误的任何想法?
查看HadoopStreaming维基页面上的示例,您似乎应该进行更改
-mapper scripts/mapper.py
-file scripts/mapper.py
Run Code Online (Sandbox Code Playgroud)
至
-mapper mapper.py
-file scripts/mapper.py
Run Code Online (Sandbox Code Playgroud)
因为"发货的文件进入工作目录".您可能还需要直接指定python解释器:
-mapper /path/to/python mapper.py
-file scripts/mapper.py
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
12092 次 |
| 最近记录: |