Gee*_*eek 8 grep sed awk text-processing
我有多个条目描述了一个非常大的日志文件中的事件,比如A.log。我想对日志文件中的事件条目做两件事:
一个典型的事件条目如下所示,它们之间会有其他文本。所以在下面的例子中有两个事件条目,第一个包含两个DataChangeEntry 有效载荷,第二个包含一个DataChangeEntry 有效载荷。
Data control raising event :DataControl@263c015d[[
#### DataChangeEvent #### on [DataControl name=PatternMatch_LegendTimeAxis, binding=.dynamicRegion1. beam_project_PatternMatch_dashboard_LegendTimeAxis_taskflow_LegendTimeAxis_beamDashboardLegendTimeAxisPageDef_beam_project_PatternMatch_dashboard_LegendTimeAxis_taskflow_LegendTimeAxis_beamDashboardLegendTimeAxis_xml_ps_taskflowid.dynamicRegion58. beam_project_PatternMatch_view_LegendTimeAxis_taskflow_LegendTimeAxis_beamVizLegendTimeAxisPageDef_beam_project_PatternMatch_view_LegendTimeAxis_taskflow_LegendTimeAxis_beamVizLegendTimeAxis_xml_ps_taskflowid.QueryIterator]
Filter/Collection Id : 0
Collection Level : 0
Sequence Id : 616
ViewSetId : PatternMatch.LegendTimeAxis_V1_0_SN49
==== DataChangeEntry (#1)
ChangeType : UPDATE
KeyPath : [2014-06-26 06:15:00.0, 0]
AttributeNames : [DATAOBJECT_CREATED, COUNTX, QueryName]
AttributeValues : [2014-06-26 06:15:00.0, 11, StrAvgCallWaitTimeGreaterThanThreshold]
AttributeTypes : [java.sql.Timestamp, java.lang.Integer, java.lang.String, ]
==== DataChangeEntry (#2)
ChangeType : UPDATE
KeyPath : [2014-06-26 06:15:00.0, 0]
AttributeNames : [DATAOBJECT_CREATED, COUNTX, QueryName]
AttributeValues : [2014-06-26 06:15:00.0, 9, AverageCallWaitingTimeGreateThanThreshold]
AttributeTypes : [java.sql.Timestamp, java.lang.Integer, java.lang.String, ]
]]
someother non useful text
spanning multiple lines
Data control raising event :DataControl@263c015d[[
#### DataChangeEvent #### on [DataControl name=PatternMatch_LegendTimeAxis, binding=.dynamicRegion1. beam_project_PatternMatch_dashboard_LegendTimeAxis_taskflow_LegendTimeAxis_beamDashboardLegendTimeAxisPageDef_beam_project_PatternMatch_dashboard_LegendTimeAxis_taskflow_LegendTimeAxis_beamDashboardLegendTimeAxis_xml_ps_taskflowid.dynamicRegion58. beam_project_PatternMatch_view_LegendTimeAxis_taskflow_LegendTimeAxis_beamVizLegendTimeAxisPageDef_beam_project_PatternMatch_view_LegendTimeAxis_taskflow_LegendTimeAxis_beamVizLegendTimeAxis_xml_ps_taskflowid.QueryIterator]
Filter/Collection Id : 0
Collection Level : 0
Sequence Id : 616
ViewSetId : PatternMatch.LegendTimeAxis_V1_0_SN49
==== DataChangeEntry (#1)
ChangeType : UPDATE
KeyPath : [2014-06-26 06:15:00.0, 0]
AttributeNames : [DATAOBJECT_CREATED, COUNTX, QueryName]
AttributeValues : [2014-06-26 06:15:00.0, 11, StrAvgCallWaitTimeGreaterThanThreshold]
AttributeTypes : [java.sql.Timestamp, java.lang.Integer, java.lang.String, ]
]]
Run Code Online (Sandbox Code Playgroud)
请注意,==== DataChangeEntry事件条目中的行数 可以是可变的。它也可以完全不存在,这表示事件负载为空,并且是一个错误条件,并且肯定也想捕获这种情况。
由于在这种情况下 entry 的输出跨越多行,因此我使用普通的 grep 并没有走多远。所以我正在寻求专家的建议。
PS:
我希望这能做到。事件进入events文件。消息会发送到标准输出。
将此文件保存到 myprogram.awk(例如):
#!/usr/bin/awk -f
BEGIN {
s=0; ### state. Active when parsing inside an event
nevent=0; ### Current event number
printf "" > "events"
}
# Start of event
/^ *Data control raising event/ {
s=1;
dentries=0;
print "*** Event number: " nevent >> "events"
nevent++
}
# Standard event line
s==1 {
print >> "events"
}
# DataChangeEntry line
/^ *==== DataChangeEntry/ {
dentries ++
}
# End of event
s==1 && /^ *\]\]/ {
s=0;
print "" >> "events"
if(dentries==0){
print "Warning: Event " nevent " has no Data Entries"
}
}
END {
print "Total event count: " nevent
}
Run Code Online (Sandbox Code Playgroud)
您可以通过不同的方式调用它:
myprogram.awk inputfile.txtawk -f myprogram.awk inputfile.txt示例输出:
Warning: Event 3 has no Data Entries
Total event count: 3
Run Code Online (Sandbox Code Playgroud)
events您可以在工作目录中调用的文件中一起检查所有事件。