Nin*_*nad 1 java performance mapreduce mongodb aggregation-framework
我有一系列事件,其结构如下:
{
"_id" : ObjectId("537b3ff288f4ca2f471afcae"),
"Name" : "PREMISES MAP DELETED",
"ScreenName" : "AccessPointActivity",
"Timestamp" : NumberLong("1392113758000"),
"EventParams" : "null",
"TracInfo" : {
"ApplicationId" : "fa41f204bfc711e3b9f9c8cbb8c502c4",
"DeviceId" : "2_1VafJVPu4yfdbMWO1XGROjK6iQZhq4hAVCQL837W",
"UserId" : "pawan",
"SessionId" : "a8UHE16mowNwNGyuLXbW",
"WiFiAP" : "null",
"WiFiStrength" : 0,
"BluetoothID" : "null",
"BluetoothStrength" : 0,
"NetworkType" : "null",
"NetworkSubType" : "null",
"NetworkCarrier" : "Idea",
"Age" : 43,
"Gender" : "Female",
"OSVersion" : "16",
"Manufacturer" : "samsung",
"Resolution" : "600*976",
"Platform" : "Android",
"Latitude" : 40.42,
"Longitude" : -74,
"City" : "Monmouth County",
"CityLowerCase" : "monmouth county",
"Country" : "United States",
"CountryLowerCase" : "united states",
"Region" : "New Jersey",
"RegionLowerCase" : "new jersey",
"Time_zone" : "null",
"PinCode" : "07732",
"Locale" : ", Paradise Trailer Park",
"Accuracy" : 0,
"Timestamp" : NumberLong("1392113758000")
}
}
Run Code Online (Sandbox Code Playgroud)
他们在不同的屏幕上有很多活动.
我的预期产量如下:
{
ApplicationId:"fa41f204bfc711e3b9f9c8cbb8c502c4",
EventName:"PREMISES MAP DELETED",
Eventcount:300,
ScreenviewCount:20,
DeviceCount:10,
UserCount:3
}
Run Code Online (Sandbox Code Playgroud)
EventCount:它是EventName的计数
ScreenviewCount:每个会话的不同screenName的计数
DeviceCount:它是不同deviceId的计数
UserCount:它是不同userCount的计数
它们将在多个屏幕上显示多个事件(ScreenName).
目前我正在使用以下方法:
使用聚合来获取每个事件名称并计算例如:
{
_id:
{
ApplicationId:"fa41f204bfc711e3b9f9c8cbb8c502c4",
EventName:"PREMISES MAP DELETED"
}
EventCount:300
Run Code Online (Sandbox Code Playgroud)
}
对于上面聚合结果中的每个事件名称,我在while循环中调用以下查询,直到聚合输出包含文档:
a)使用来自聚合输出的eventName进行屏幕视图计数(在事件收集时)的不同查询.
b)来自设备计数的聚合输出的不同查询eventName(在事件集合上).
c)来自用户计数的聚合输出的不同查询eventName(在事件集合上).
问题是它很慢,因为它对聚合输出的每个结果有3个不同的查询.
他们是否可以通过单个聚合调用或其他方式执行此操作.
先感谢您!!!
您似乎错过的一般情况是,要在"事件"总计下获取文档中各个字段的"不同"值,您可以使用$addToSet运算符.
根据定义,"set"的所有值都是"唯一/不同",因此您只想将所有可能的值保存在分组级别的"set"中,然后获得所生成数组的"大小",即正是$size运营商在MongoDB 2.6中引入的内容.
db.collection.aggregate([
{ "$group": {
"_id": {
"ApplicationId": "$TracInfo.ApplicationId",
"EventName": "$Name",
},
"oScreenViewCount": {
"$addToSet": {
"ScreenName": "$ScreenName",
"SessionId": "$TracInfo.SessionId",
}
},
"oDeviceCount": { "$addToSet": "$TracInfo.DeviceId" },
"oUserCount": { "$addToSet": "$TracInfo.UserId" },
"oEventcount": { "$sum": 1 }
}},
{ "$project": {
"_id": 0,
"ApplicationId": "$_id.ApplicationId",
"EventName": "$_id.EventName",
"EventCount": "$oEventCount",
"ScreenViewCount": { "$size": "$oScreenViewCount" },
"DeviceCount": { "$size": "$oDeviceCount" },
"UserCount": { "$size": "$oUserCount" }
}}
])
Run Code Online (Sandbox Code Playgroud)
MongoDB 2.6之前的版本需要更多的工作,使用$unwind和$group计算数组:
db.collection.aggregate([
{ "$group": {
"_id": {
"ApplicationId": "$TracInfo.ApplicationId",
"EventName": "$Name",
},
"oScreenviewCount": {
"$addToSet": {
"ScreenName": "$ScreenName",
"SessionId": "$TracInfo.SessionId",
}
},
"oDeviceCount": { "$addToSet": "$TracInfo.DeviceId" },
"oUserCount": { "$addToSet": "$TracInfo.UserId" },
"oEventcount": { "$sum": 1 }
}},
{ "$unwind": "$oScreeenviewCount" },
{ "$group": {
"_id": "$_id",
"oScreenviewCount": { "$sum": 1 },
"oDeviceCount": { "$first": "$oDeviceCount" },
"oUserCount": { "$first": "$oUserCount" },
"oEventcount": { "$first": "$oEventCount" }
}},
{ "$unwind": "$oDeviceCount" },
{ "$group": {
"_id": "$_id",
"oScreenviewCount": { "$first": "$oScreenViewCount" },
"oDeviceCount": { "$sum": "$oDeviceCount" },
"oUserCount": { "$first": "$oUserCount" },
"oEventcount": { "$first": "$oEventCount" }
}},
{ "$unwind": "$oUserCount" },
{ "$group": {
"_id": "$_id",
"oScreenviewCount": { "$first": "$oScreenViewCount" },
"oDeviceCount": { "$first": "$oDeviceCount" },
"oUserCount": { "$sum": "$oUserCount" },
"oEventcount": { "$first": "$oEventCount" }
}},
{ "$project": {
"_id": 0,
"ApplicationId": "$_id.ApplicationId",
"EventName": "$_id.EventName",
"EventCount": "$oEventCount",
"ScreenViewCount": "$oScreenViewCount",
"DeviceCount": "$oDeviceCount",
"UserCount": "$oUserCount"
}}
])
Run Code Online (Sandbox Code Playgroud)
$project第二个列表中的最终用法以及"o"前缀名称的所有常规用法实际上只是为了在结尾处弄清楚结果并确保输出字段顺序与示例结果中的相同.
作为一般免责声明,您的问题缺乏确定用于这些总计的确切字段或组合的信息,但原则和方法是合理的,并且应该足够接近相同的实现.
所以基本上,你通过使用$addToSet任何字段或组合来获得"组"中的"不同"值,然后你通过任何可用的方式确定那些"集合"的"计数".
比发出许多查询并在客户端代码中合并结果要好得多.