Entity Framework 6 - Group by then Order by the First() 耗时太长

Alo*_*zo2 4 c# mysql entity-framework group-by sql-order-by

我真的需要这方面的帮助,经过数小时的搜索后找不到相关的答案。

MySQL,Entity Framework 6,数百万条记录的数据库,记录如下:

Indexint(11) NOT NULL  
TaskIDint(11) NOT NULL  
DeviceIDbigint(20) NOT NULL  
Commentslongtext NULL  
ExtendedResultslongtext NULL  
RunResultint(11) NOT NULL  
JobResultint(11) NOT NULL  
JobResultValuedouble NOT NULL  
ReporterIDbigint(20) NOT NULL  
FieldIDbigint(20) NOT NULL  
TimeOfRundatetime NOT NULL  
Run Code Online (Sandbox Code Playgroud)

我需要的是获取特定 taskID 的所有记录,然后按 DeviceID 分组并按 TimeOfRun 排序,以便获取特定 taskID 中每个 deviceID 的最新数据。

这是我的代码:

List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).
                Select(x => x.OrderByDescending(y => y.TimeOfRun).FirstOrDefault()).ToList();
Run Code Online (Sandbox Code Playgroud)

但这是生成的查询:

{SELECT
`Apply1`.`Index`, 
`Apply1`.`TaskID`, 
`Apply1`.`DEVICEID1` AS `DeviceID`, 
`Apply1`.`RunResult`, 
`Apply1`.`JobResult`, 
`Apply1`.`JobResultValue`, 
`Apply1`.`ExtendedResults`, 
`Apply1`.`Comments`, 
`Apply1`.`ReporterID`, 
`Apply1`.`FieldID`, 
`Apply1`.`TimeOfRun`
FROM (SELECT
`Project2`.`p__linq__0`, 
`Project2`.`DeviceID`, 
(SELECT
`Project3`.`Index`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Index`, 
(SELECT
`Project3`.`TaskID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TaskID`, 
(SELECT
`Project3`.`DeviceID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `DEVICEID1`, 
(SELECT
`Project3`.`RunResult`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `RunResult`, 
(SELECT
`Project3`.`JobResult`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResult`, 
(SELECT
`Project3`.`JobResultValue`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResultValue`, 
(SELECT
`Project3`.`ExtendedResults`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ExtendedResults`, 
(SELECT
`Project3`.`Comments`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Comments`, 
(SELECT
`Project3`.`ReporterID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ReporterID`, 
(SELECT
`Project3`.`FieldID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `FieldID`, 
(SELECT
`Project3`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TimeOfRun`
FROM (SELECT
@p__linq__0 AS `p__linq__0`, 
`Distinct1`.`DeviceID`
FROM (SELECT DISTINCT 
`Extent1`.`DeviceID`
FROM `JobsRecordHistories` AS `Extent1`
 WHERE `Extent1`.`TaskID` = @p__linq__0) AS `Distinct1`) AS `Project2`) AS `Apply1`}
Run Code Online (Sandbox Code Playgroud)

这需要太长时间。
我承认我不太了解 SQL,但是如果在 WHERE 语句之后插入 ToList(),那么我会更快地得到结果,尽管这仍然不是正确的做法,因为有很多不需要的数据在这种情况下,数据库传递给我的应用程序,并且对于 40k 记录,它仍然很慢 = 30 秒。

我也试过这个:

Dictionary<long, DateTime> DeviceIDAndTime = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID)
                .Select(g => new DeviceIDaAndTime { deviceID = g.Key, timeOfRun = g.Max(gi => gi.TimeOfRun) }).ToDictionary(x => x.deviceID, x => x.timeOfRun);
Run Code Online (Sandbox Code Playgroud)

为了以这种方式使用字典:

                List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => DeviceIDAndTime.Keys.Contains(x.DeviceID) && x.TimeOfRun == DeviceIDAndTime[x.DeviceID]).ToList();
Run Code Online (Sandbox Code Playgroud)

但我收到此错误:

Additional information: LINQ to Entities does not recognize the method 'System.DateTime get_Item(Int64)' method, and this method cannot be translated into a store expression.
Run Code Online (Sandbox Code Playgroud)

根据我的理解,这是有道理的,当将 timeOfRun 与字典值进行比较时,LINQ 在编写查询时需要一个特定的值而不是一个集合。

我很奇怪我没有找到任何相关的帖子,其他人也没有遇到这个问题。我想我错过了什么。

感谢任何帮助,谢谢

Alo*_*zo2 5

终于想通了,并提高了性能。
我需要一个查询和一个子查询,我需要 MAX 函数而不是 ORDER,因为我不关心结果的顺序,我只关心最大的 (timeOfRun)。
此外,一旦我注意到更大的索引列(我的 PK,自动增量)意味着更多的最新数据,事情就变得简单了,所以我不需要 MAX(timeOfRun),而是使用了 MAX(Index),尽管我很确定它会以同样的方式工作。

这是我的 LINQ:

var historyQuery = db.JobsRecordHistories.AsNoTracking().Where(y => y.TaskID == taskID &&
                                    db.JobsRecordHistories.Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).Select(g => g.Max(i => i.Index)).Contains<int>(y.Index));
Run Code Online (Sandbox Code Playgroud)

这是生成的 SQL:

{SELECT
`Extent1`.`Index`, 
`Extent1`.`TaskID`, 
`Extent1`.`DeviceID`, 
`Extent1`.`RunResult`, 
`Extent1`.`JobResult`, 
`Extent1`.`JobResultValue`, 
`Extent1`.`ExtendedResults`, 
`Extent1`.`Comments`, 
`Extent1`.`ReporterID`, 
`Extent1`.`FieldID`, 
`Extent1`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Extent1`
 WHERE (`Extent1`.`TaskID` = @p__linq__0) AND (EXISTS(SELECT
1 AS `C1`
FROM (SELECT
`Extent2`.`DeviceID` AS `K1`, 
MAX(`Extent2`.`Index`) AS `A1`
FROM `JobsRecordHistories` AS `Extent2`
 WHERE `Extent2`.`TaskID` = @p__linq__1
 GROUP BY 
`Extent2`.`DeviceID`) AS `GroupBy1`
 WHERE `GroupBy1`.`A1` = `Extent1`.`Index`))}
Run Code Online (Sandbox Code Playgroud)

我希望这会对某人有所帮助,因为我花了 1.5 天的谷歌搜索,查看 SQL 查询、LINQ、调试和优化