MRJob: - 在map reduce中显示中间值

Rea*_*d Q 3 python hadoop mapreduce mrjob

在使用python MRJob库运行mapreduce程序时,如何在终端上显示中间值(即打印变量或列表)?

pac*_*erg 6

您可以使用sys.stderr.write()将结果输出为标准错误.这是一个例子:

from mrjob.job import MRJob
import sys
class MRWordCounter(MRJob):
    def mapper(self, key, line):
        sys.stderr.write("MAPPER INPUT: ({0},{1})\n".format(key,line))
        for word in line.split():
            yield word, 1

    def reducer(self, word, occurrences):
        occurencesList= list(occurrences)
        sys.stderr.write("REDUCER INPUT: ({0},{1})\n".format(word,occurencesList))
        yield word, sum(occurencesList)

if __name__ == '__main__':
    MRWordCounter.run()
Run Code Online (Sandbox Code Playgroud)

  • 供以后参考:您还可以查看作业创建的临时文件。映射器和组合的排序输出都存储在临时文件中。MRjob 输出临时文件的路径。 (2认同)