按流格式,我有一个很大的(20GB)csv文件。
date,ip,dev_type,env,time,cpu_usage
2015-11-09,10.241.121.172,M2,production,11:01,8
2015-11-09,10.241.121.172,M2,production,11:02,9
2015-11-09,10.241.121.243,C1,preproduction,11:01,4
2015-11-09,10.241.121.243,C1,preproduction,11:02,8
2015-11-10,10.241.121.172,M2,production,11:01,3
2015-11-10,10.241.121.172,M2,production,11:02,9
2015-11-10,10.241.121.243,C1,preproduction,11:01,4
2015-11-10,10.241.121.243,C1,preproduction,11:02,8
Run Code Online (Sandbox Code Playgroud)
并以流动格式导入elasticheaseh
{
"_index": "cpuusage",
"_type": "logs",
"_id": "AVFOkMS7Q4jUWMFNfSrZ",
"_score": 1,
"_source": {
"date": "2015-11-10",
"ip": "10.241.121.172",
"dev_type": "M2",
"env": "production",
"time": "11:02",
"cpu_usage": "9"
},
"fields": {
"date": [
1447113600000
]
}
}
...
Run Code Online (Sandbox Code Playgroud)
所以当我发现每天每个IP的cpu_usage最大值时,如何输出所有字段(日期,ip,dev_type,env,cpu_usage)
curl -XGET localhost:9200/cpuusage/_search?pretty -d '{
"size": 0,
"aggs": {
"by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { …Run Code Online (Sandbox Code Playgroud)