aki*_*ira 2 java hadoop mapreduce
我正在尝试使用Hadoop并创建一个非常简单的地图并减少工作量.输入是一个30行文本文件,输出只有3行(它是日志文件的摘录,其中地图提取页面名称和执行时间,而reduce计算min,max和avg执行时间).
这个简单的工作需要大约36秒才能在Padoo-distributed模式下在Hadoop上执行(fs.default.name = hdfs:// localhost,dfs.replication = 1,mapred.job.tracker = localhost:8021).这是运行Ubuntu 10.04的2.93Ghz Nehalem,8GB内存,X25-E SSD.
我在mapper和reducer中为每次调用添加了调试输出,这清楚地表明每次调用mapper都是如此.减速器发生在同一秒.换句话说,开销是在Hadoop实际调用我的代码之前和之后.
这是最终输出:
18:16:59 INFO input.FileInputFormat: Total input paths to process : 1
18:16:59 INFO mapred.JobClient: Running job: job_201108251550_0023
18:17:00 INFO mapred.JobClient: map 0% reduce 0%
18:17:15 INFO mapred.JobClient: map 100% reduce 0%
18:17:30 INFO mapred.JobClient: map 100% reduce 100%
18:17:35 INFO mapred.JobClient: Job complete: job_201108251550_0023
18:17:35 INFO mapred.JobClient: Counters: 25
18:17:35 INFO mapred.JobClient: Job Counters
18:17:35 INFO mapred.JobClient: Launched reduce tasks=1
18:17:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11305
18:17:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18:17:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18:17:35 INFO mapred.JobClient: Launched map tasks=1
18:17:35 INFO mapred.JobClient: Data-local map tasks=1
18:17:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12762
18:17:35 INFO mapred.JobClient: File Output Format Counters
18:17:35 INFO mapred.JobClient: Bytes Written=140
18:17:35 INFO mapred.JobClient: FileSystemCounters
18:17:35 INFO mapred.JobClient: FILE_BYTES_READ=142
18:17:35 INFO mapred.JobClient: HDFS_BYTES_READ=7979
18:17:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=41899
18:17:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=140
18:17:35 INFO mapred.JobClient: File Input Format Counters
18:17:35 INFO mapred.JobClient: Bytes Read=7877
18:17:35 INFO mapred.JobClient: Map-Reduce Framework
18:17:35 INFO mapred.JobClient: Reduce input groups=3
18:17:35 INFO mapred.JobClient: Map output materialized bytes=142
18:17:35 INFO mapred.JobClient: Combine output records=0
18:17:35 INFO mapred.JobClient: Map input records=30
18:17:35 INFO mapred.JobClient: Reduce shuffle bytes=0
18:17:35 INFO mapred.JobClient: Reduce output records=3
18:17:35 INFO mapred.JobClient: Spilled Records=12
18:17:35 INFO mapred.JobClient: Map output bytes=124
18:17:35 INFO mapred.JobClient: Combine input records=0
18:17:35 INFO mapred.JobClient: Map output records=6
18:17:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=102
18:17:35 INFO mapred.JobClient: Reduce input records=6
Run Code Online (Sandbox Code Playgroud)
可以看出,在地图完成之前需要16秒,然后在减少完成之前需要15秒,然后仍然需要6秒直到完成作业.
这是reduce任务的syslog:
18:17:14,894 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
18:17:15,007 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
18:17:15,088 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=130514944, MaxSingleShuffleLimit=32628736
18:17:15,093 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Thread started: Thread for merging on-disk files
18:17:15,093 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Thread started: Thread for merging in memory files
18:17:15,093 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Thread waiting: Thread for merging on-disk files
18:17:15,094 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Need another 1 map output(s) where 0 is already in progress
18:17:15,094 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Thread started: Thread for polling Map Completion Events
18:17:15,095 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
18:17:20,095 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201108251550_0023_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
18:17:21,103 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting
18:17:21,104 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
18:17:21,104 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
18:17:21,104 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left.
18:17:21,104 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 1 files left.
18:17:21,110 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
18:17:21,110 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 138 bytes
18:17:21,113 INFO org.apache.hadoop.mapred.ReduceTask: Merged 1 segments, 138 bytes to disk to satisfy reduce memory limit
18:17:21,114 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 142 bytes from disk
18:17:21,114 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce
18:17:21,114 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
18:17:21,117 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 138 bytes
18:17:21,163 INFO org.apache.hadoop.mapred.Task: Task:attempt_201108251550_0023_r_000000_0 is done. And is in the process of commiting
18:17:24,170 INFO org.apache.hadoop.mapred.Task: Task attempt_201108251550_0023_r_000000_0 is allowed to commit now
18:17:24,180 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_201108251550_0023_r_000000_0' to output.txt
18:17:27,088 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201108251550_0023_r_000000_0' done.
18:17:27,091 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
18:17:27,110 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
18:17:27,110 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName akira for UID 1000 from the native implementation
Run Code Online (Sandbox Code Playgroud)
我的实际减速器代码在18:17:21调用.
是什么赋予了?这是正常的吗?
| 归档时间: |
|
| 查看次数: |
2421 次 |
| 最近记录: |