我有一个日志文件如下
Begin ... 12-07-2008 02:00:05 ----> record1
incidentID: inc001
description: blah blah blah
owner: abc
status: resolved
end .... 13-07-2008 02:00:05
Begin ... 12-07-2008 03:00:05 ----> record2
incidentID: inc002
description: blah blah blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah
owner: abc
status: resolved
end .... 13-07-2008 03:00:05
Run Code Online (Sandbox Code Playgroud)
我想用mapreduce来处理这个.我想提取事件ID,状态以及事件所需的时间
如何处理两个记录,因为它们具有可变记录长度以及在记录结束之前输入分割发生的情况.