在我看来,如果没有该行,文件运行相同.
我正在尝试使用Python脚本在Hadoop Streaming上运行Map-Reduce作业,并在python中获得与Hadoop Streaming Job失败错误相同的错误,但这些解决方案对我不起作用.
当我运行"cat sample.txt | ./p1mapper.py | sort | ./p1reducer.py"时,我的脚本运行正常.
但是当我运行以下内容时:
./bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \
-input "p1input/*" \
-output p1output \
-mapper "python p1mapper.py" \
-reducer "python p1reducer.py" \
-file /Users/Tish/Desktop/HW1/p1mapper.py \
-file /Users/Tish/Desktop/HW1/p1reducer.py
Run Code Online (Sandbox Code Playgroud)
(注意:即使我删除了"python"或输入-mapper和-reducer的完整路径名,结果也一样)
这是我得到的输出:
packageJobJar: [/Users/Tish/Desktop/HW1/p1mapper.py, /Users/Tish/Desktop/CS246/HW1/p1reducer.py, /Users/Tish/Documents/workspace/hadoop-0.20.2/tmp/hadoop-unjar4363616744311424878/] [] /var/folders/Mk/MkDxFxURFZmLg+gkCGdO9U+++TM/-Tmp-/streamjob3714058030803466665.jar tmpDir=null
11/01/18 03:02:52 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/18 03:02:52 INFO streaming.StreamJob: getLocalDirs(): [tmp/mapred/local]
11/01/18 03:02:52 INFO streaming.StreamJob: Running job: job_201101180237_0005
11/01/18 03:02:52 INFO streaming.StreamJob: To kill this …Run Code Online (Sandbox Code Playgroud)