相关疑难解决方法(0)

983
推荐指数
17
解决办法
68万
查看次数

Python中的Hadoop流式传输作业失败(不成功)

我正在尝试使用Python脚本在Hadoop Streaming上运行Map-Reduce作业,并在python中获得与Hadoop Streaming Job失败错误相同的错误,但这些解决方案对我不起作用.

当我运行"cat sample.txt | ./p1mapper.py | sort | ./p1reducer.py"时,我的脚本运行正常.

但是当我运行以下内容时:

./bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \
    -input "p1input/*" \
    -output p1output \
    -mapper "python p1mapper.py" \
    -reducer "python p1reducer.py" \
    -file /Users/Tish/Desktop/HW1/p1mapper.py \
    -file /Users/Tish/Desktop/HW1/p1reducer.py
Run Code Online (Sandbox Code Playgroud)

(注意:即使我删除了"python"或输入-mapper和-reducer的完整路径名,结果也一样)

这是我得到的输出:

packageJobJar: [/Users/Tish/Desktop/HW1/p1mapper.py, /Users/Tish/Desktop/CS246/HW1/p1reducer.py, /Users/Tish/Documents/workspace/hadoop-0.20.2/tmp/hadoop-unjar4363616744311424878/] [] /var/folders/Mk/MkDxFxURFZmLg+gkCGdO9U+++TM/-Tmp-/streamjob3714058030803466665.jar tmpDir=null
11/01/18 03:02:52 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/18 03:02:52 INFO streaming.StreamJob: getLocalDirs(): [tmp/mapred/local]
11/01/18 03:02:52 INFO streaming.StreamJob: Running job: job_201101180237_0005
11/01/18 03:02:52 INFO streaming.StreamJob: To kill this …
Run Code Online (Sandbox Code Playgroud)

python streaming hadoop mapreduce

5
推荐指数
1
解决办法
1万
查看次数

标签 统计

python ×2

hadoop ×1

mapreduce ×1

shebang ×1

shell ×1

streaming ×1