这个 CP-SAT 模型会更快吗？

Question

这个 CP-SAT 模型会更快吗？

Eli*_*Eli 1 python operations-research or-tools cp-sat

我的团队正在构建一个 CP-SAT 求解器，该求解器可以在一段时间内安排作业（例如家庭作业），并且可用性可变（可用于完成作业的时间）。我们正在努力加速我们的模型。

我们尝试了 num_search_workers 和其他参数调整，但想检查其他速度的提高。目标是在 5-10 秒内解决约 100 天的问题和最多 2000 个作业（以 M1 mac 为基准）。有任何想法吗？

问题描述：按照这些要求在 d 天内布置作业

一天的分配时间不得超过当天的可用时间
应尊重赋值依赖性（如果 A 需要 B，则 B 不应出现在 A 之后）
作业可以分开（以便更好地适应时间很少的日子）
针对一天中任务类型的多样性进行优化

# 天和 # 个作业后，解决问题的速度显着减慢。这是预料之中的，但我们想知道您是否可以提出可能的加速建议

这是一个单元测试示例。希望能够展示拆分、排序和时间限制。

days = [{"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}]
assignments = [
    {"id": 1, "resourceType": "Type0", "seconds": 2400, "deps": [], "instances": 2},
    {"id": 2, "resourceType": "Type0", "seconds": 1200, "deps": [1], "instances": 1},
    {"id": 3, "resourceType": "Type0", "seconds": 1200, "deps": [1, 2], "instances": 1},
    ]
result = cp_sat.CP_SAT_FAST.schedule(days, assignments, options=solver_options)
# expect a list of lists where each inner list is a day with the included assignments
expected = shared.SolverOutput(feasible=True, solution=[
    [{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
    [{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
    [{"id": 2, "resourceType": "Type0", "time": 1200, "instances": 1}],
    [{"id": 3, "resourceType": "Type0", "time": 1200, "instances": 1}],
    ])
self.assertEqual(result, expected)

Run Code Online (Sandbox Code Playgroud)

这是求解器：

import math
from typing import List, Dict

from ortools.sat.python import cp_model
import numpy as np

import planner.solvers as solvers
from planner.shared import SolverOutput, SolverOptions


class CP_SAT_FAST(solvers.Solver):
    """
    CP_SAT_FAST is a CP_SAT solver with speed optimizations and a time limit (passed in through options).
    """

    @staticmethod
    def schedule(days: List[Dict], assignments: List[Dict], options: SolverOptions) -> SolverOutput:
        """
        Schedules a list of assignments on a studyplan of days

        Arguments:
        days: list of dicts containing available time for that day
        assignments: list of assignments to place on schedule
        """

        model = cp_model.CpModel()

        num_assignments = len(assignments)
        num_days = len(days)

        # x[d, a] shows is assignment a is on day d
        x = np.zeros((num_days, num_assignments), cp_model.IntVar) 

        # used for resource diversity optimization
        total_resource_types = 4
        unique_today = []

        # upper and lower bounds used for dependency ordering (if a needs b then b must be before or on the day of a)
        day_ub = {}
        day_lb = {}

        # track assignment splitting
        instances = {}
        assignment_times = {}

        id_to_assignment = {}
        for a, asm in enumerate(assignments):

            # track upper and lower bounds
            day_ub[a] = model.NewIntVar(0, num_days, "day_ub")
            day_lb[a] = model.NewIntVar(0, num_days, "day_lb")
            asm["ub"] = day_ub[a]
            asm["lb"] = day_lb[a]
            id_to_assignment[asm["id"]] = asm

            max_instances = min(num_days, asm.get("instances", num_days))
            
            # each assignment must occur at least once
            instances[a] = model.NewIntVar(1, max_instances, f"instances_{a}")
            model.AddHint(instances[a], max_instances)

            # when split keep a decision variable of assignment time
            assignment_times[a] = model.NewIntVar(asm.get("seconds") // max_instances, asm.get("seconds"), f"assignment_time_{a}")
            model.AddDivisionEquality(assignment_times[a], asm.get("seconds"), instances[a])  

        for d in range(num_days):

            time_available = days[d].get("secondsAvailable", 0)
            if time_available <= 0:
                # no assignments on zero-time days
                model.Add(sum(x[d]) == 0)

            else:
                
                # track resource diversity on this day
                type0_today = model.NewBoolVar(f"type0_on_{d}")
                type1_today = model.NewBoolVar(f"type1_on_{d}")
                type2_today = model.NewBoolVar(f"type2_on_{d}")
                type3_today = model.NewBoolVar(f"type3_on_{d}")
                types_today = model.NewIntVar(0, total_resource_types, f"unique_on_{d}")
                
                task_times = []

                for a, asm in enumerate(assignments):

                    # x[d, a] = True if assignment a is on day d
                    x[d, a] = model.NewBoolVar(f"x[{d},{a}]")
                    
                    # set assignment upper and lower bounds for ordering
                    model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
                    model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])
                    
                    # track if a resource type is on a day for resource diversity optimization
                    resourceType = asm.get("resourceType")
                    if resourceType == "Type0":
                        model.AddImplication(x[d, a], type0_today)
                    elif resourceType == "Type1":
                        model.AddImplication(x[d, a], type1_today)
                    elif resourceType == "Type2":
                        model.AddImplication(x[d, a], type2_today)
                    elif resourceType == "Type3":
                        model.AddImplication(x[d, a], type3_today)
                    else:
                        raise RuntimeError(f"Unknown resource type {asm.get('resourceType')}")

                    # track of task time (considering splitting), for workload requirements
                    task_times.append(model.NewIntVar(0, asm.get("seconds"), f"time_{a}_on_{d}"))
                    model.Add(task_times[a] == assignment_times[a]).OnlyEnforceIf(x[d, a])

                # time assigned to day d cannot exceed the day's available time
                model.Add(time_available >= sum(task_times))

                # sum the unique resource types on this day for later optimization
                model.Add(sum([type0_today, type1_today, type2_today, type3_today]) == types_today)
                unique_today.append(types_today)


        """
        Resource Diversity:

        Keeps track of what instances of a resource type appear on each day
        and the minimum number of unique resource types on any day. (done above ^)
        
        Then the model objective is set to maximize that minimum
        """
        total_diversity = model.NewIntVar(0, num_days * total_resource_types, "total_diversity")
        model.Add(sum(unique_today) == total_diversity)

        avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
        model.AddDivisionEquality(avg_diversity, total_diversity, num_days)

        # Set objective
        model.Maximize(avg_diversity)


        # Assignment Occurance/Splitting and Dependencies
        for a, asm in enumerate(assignments):
            
            # track how many times an assignment occurs (since we can split)
            model.Add(instances[a] == sum(x[d, a] for d in range(num_days))) 

            # Dependencies 
            for needed_asm in asm.get("deps", []):
                needed_ub = id_to_assignment[needed_asm]["ub"]
                
                # this asm's lower bound must be greater than or equal to the upper bound of the dependency
                model.Add(num_days - asm["lb"] >= needed_ub)

        # Solve
        solver = cp_model.CpSolver()

        # set time limit
        solver.parameters.max_time_in_seconds = float(options.time_limit)
        solver.parameters.preferred_variable_order = 1
        solver.parameters.initial_polarity = 0
        # solver.parameters.stop_after_first_solution = True
        # solver.parameters.num_search_workers = 8

        intermediate_printer = SolutionPrinter()
        status = solver.Solve(model, intermediate_printer)


        print("\nStats")
        print(f"  - conflicts       : {solver.NumConflicts()}")
        print(f"  - branches        : {solver.NumBranches()}")
        print(f"  - wall time       : {solver.WallTime()}s")
        print()

        if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
            sp = []

            for i, d in enumerate(days):
                day_time = 0
                days_tasks = []
                for a, asm in enumerate(assignments):
                    if solver.Value(x[i, a]) >= 1:
                        asm_time = math.ceil(asm.get("seconds") / solver.Value(instances[a]))
                        day_time += asm_time

                        days_tasks.append({"id": asm["id"], "resourceType": asm.get("resourceType"), "time": asm_time, "instances": solver.Value(instances[a])})
                
                sp.append(days_tasks)

            return SolverOutput(feasible=True, solution=sp)
            
        else:
            return SolverOutput(feasible=False, solution=[])


class SolutionPrinter(cp_model.CpSolverSolutionCallback):

    def __init__(self):
        cp_model.CpSolverSolutionCallback.__init__(self)
        self.__solution_count = 0

    def on_solution_callback(self):
        print(f"Solution {self.__solution_count} objective value = {self.ObjectiveValue()}")
        self.__solution_count += 1

Run Code Online (Sandbox Code Playgroud)

Answer 1

Chr*_*ins 5

在回答您的实际问题之前，我想指出您的模型中的一些我怀疑没有按您的预期工作的内容。

给定日期对作业类型的限制

model.AddImplication(x[d, a], type0_today)

Run Code Online (Sandbox Code Playgroud)

type0_today == 1等等，如果当天有该类型的作业，请强制执行。type0_today == 0但是，如果当天没有该类型的作业，则不会强制执行此操作。求解器仍然可以自由选择type0_today == 1，并且它会这样做，因为这满足了这个约束并且也直接增加了目标函数。您可能会发现，在您给出的测试用例的最佳解决方案中，所有type0_todaytotype3_today变量均为 1 avg_diversity == 4，并且在最佳解决方案中，即使输入数据中没有除 0 之外的任何类型的赋值。在建模的早期阶段，检查模型中所有变量值的合理性总是一个好主意。

由于我没有安装 Python，因此我将您的模型转换为 C# 以便能够进行一些实验。抱歉，您必须将其翻译成等效的 Python 代码。我重新制定了对变量的约束type0_today，以使用数组type_today[d, t]（对于 dayd和 type t）并使用AddMaxEquality约束，对于布尔变量来说，该约束相当于所有参与变量的逻辑或：

    // For each day...
    for (int d = 0; d < num_days; d++)
        {
            // ... make a list for each assignment type of all x[d, a] where a has that type.
            List<IntVar>[] assignmentsByType = new List<IntVar>[total_resource_types];
            for (int t = 0; t < total_resource_types; t++)
            {
                assignmentsByType[t] = new List<IntVar>();
            }
            for (int a = 0; a < num_assignments; a++)
            {
                int t = getType(assignments[a].resourceType);
                assignmentsByType[t].Add(x[d, a]);
            }
            // Constrain the types present on the day to be the logical OR of assignments with that type on that day
            for (int t = 0; t < total_resource_types; t++)
            {
                if (assignmentsByType[t].Count > 0)
                {
                    model.AddMaxEquality(type_today[d, t], assignmentsByType[t]); 
                }
                else
                {
                    model.Add(type_today[d, t] == 0);
                }

            }
        }

Run Code Online (Sandbox Code Playgroud)

您将平均多样性计算为

        avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
        model.AddDivisionEquality(avg_diversity, total_diversity, num_days)

Run Code Online (Sandbox Code Playgroud)

由于求解器仅适用于整数变量，avg_diversity因此将恰好是值 0、1、2、3 或 4 之一，没有小数部分。该约束AddDivisionEquality还将确保是total_diversity和的精确整数倍。这是对解决方案的非常严格的限制，并且在许多情况下会导致不可行，我认为这是您不希望的。avg_diversitynum_days

例如，avg_diversity == 3和num_days == 20是total_diversity == 60一种允许的解决方案，但total_diversity == 63不会被允许，尽管该解决方案中有三天的多样性比的解决方案更高total_diversity == 60。

相反，我建议您消除变量avg_diversity及其约束，并简单地用作total_diversity目标函数。由于求解过程中天数是固定常数，因此最大化总多样性将是等效的，而不会引入人为的不可行性。

也就是说，这是我的答案。

一般约束满足问题是一般的 NP 问题，不应期望能够很好地扩展。尽管许多具体的问题公式实际上可以快速解决，但输入数据或公式的微小变化可能会将问题推入指数黑洞。除了尝试各种方法来看看哪种方法最适合您的具体问题之外，确实没有其他方法。

虽然听起来很矛盾，但求解器更容易找到强约束问题的最佳解决方案，而不是弱约束问题的最佳解决方案（假设它们是可行的！）。强约束问题中的搜索空间比弱约束问题中的搜索空间小，因此求解器在优化实验方面的选择较少，因此可以更快地完成工作。

第一个建议

在您的问题中，每个作业都有变量day_ub和。day_lb它们的范围是从 0 到num_days. 对他们的限制

                    model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
                    model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])

Run Code Online (Sandbox Code Playgroud)

允许求解器自由选择 0 和最大d对应值之间的任何值。最大(num_days - d)（含）。在优化过程中，求解器可能会花时间尝试这些变量的不同值，但很少发现它会带来改进；仅当依赖分配的位置发生更改时才会发生这种情况。

您可以消除变量day_ub及其day_lb约束，而是直接用x变量制定依赖关系。

在我的 C# 模型中，我重新表述了赋值依赖约束，如下所示：

for (int a = 0; a < num_assignments; a++)
            {
                Assignment assignment = assignments[a];
                foreach (int predecessorIndex in getPredecessorAssignmentIndicesFor(assignment))
                {
                    for (int d1 = 0; d1 < num_days; d1++)
                    {
                        for (int d2 = 0; d2 < d1; d2++)
                        {
                            model.AddImplication(x[d1, predecessorIndex], x[d2, a].Not());
                        }
                    }
                }
            }

Run Code Online (Sandbox Code Playgroud)

换句话说：如果赋值predecessorIndexA ( ) 所a依赖的赋值 B ( ) 被放置在 day 上d1，那么所有的都x[0..d1, a]必须为假。这使用变量直接关联依赖关系，x而不是引入具有额外自由度的帮助变量，从而使求解器陷入困境。这种变化减少了问题中的变量数量并增加了约束数量，这两者都有助于求解器。

在我进行的 25 天、35 项作业的实验中，检查模型统计数据显示

原来的：

#Variables: 2020
#kIntDiv:   35
#kIntMax:   100
#kLinear1:  1750
#kLinear2:  914
#kLinearN:  86
Total constraints   2885

Run Code Online (Sandbox Code Playgroud)

新配方：

#Variables: 1950
#kBoolOr:   11700
#kIntDiv:   35
#kIntMax:   100
#kLinear2:  875
#kLinearN:  86
Total constraints   12796

Run Code Online (Sandbox Code Playgroud)

因此，新的公式的变量更少，但限制更多。

实验中的求解时间得到了改进，求解器仅用了 2.6 秒即可实现，total_diversity == 68 而不是超过 90 秒。

原创配方

Time    Objective
0,21    56
0,53    59
0,6 60
0,73    61
0,75    62
0,77    63
2,9 64
3,82    65
3,84    66
91,01   67
91,03   68
91,05   69

Run Code Online (Sandbox Code Playgroud)

新配方

Time    Objective
0,2347  41
0,3066  42
0,4252  43
0,4602  44
0,5014  49
0,6437  50
0,6777  51
0,6948  52
0,7108  53
0,9593  54
1,0178  55
1,1535  56
1,2023  57
1,2351  58
1,2595  59
1,2874  60
1,3097  61
1,3325  62
1,388   63
1,5698  64
2,4948  65
2,5993  66
2,6198  67
2,6431  68
32,5665 69

Run Code Online (Sandbox Code Playgroud)

当然，您获得的求解时间将很大程度上取决于输入数据。

第二个建议

在我的实验中，我观察到当作业有很多依赖性时，找到解决方案的速度要快得多。这与约束程度更高的模型更容易求解是一致的。

如果您经常有相同类型和持续时间的分配（例如测试数据中的数字 2 和 3），并且它们都具有实例 == 1` 并且没有依赖项或相同的依赖项，那么交换它们在解决方案中的位置将不会改善目标。

在预处理步骤中，您可以查找此类重复项并使其中一个依赖于另一个。这本质上是一个对称破缺约束。这将防止求解器浪费时间尝试交换位置是否会改善目标。

第三个建议

该解决方案需要确定解决方案中将出现每个分配的多少个实例。这需要每个分配两个变量instances[a]并assignment_times[a]具有关联的约束。

instances[a]您可以不这样做，而是摆脱变量，而是在预处理步骤中将assignment_times[a]赋值拆分为多个赋值。instances > 1例如，在您的测试数据中，作业 1 将分为两个作业 1_1 和 1_2，每个作业都有instances == 1和seconds = 1200。对于这个测试用例instances == 2，对于分配 1，这不会对最终解决方案产生任何影响——求解器可能会将 1_1 和 1_2 安排在同一天，也许不会，但最终结果相当于拆分或不拆分，但不会不需要额外的变量。

在预处理步骤中，当拆分分配时，您应该添加对称性破坏约束，使 1_2 依赖于 1_1 等，原因如上所述。

当一个作业有时instances > 2，在运行之前将其拆分为多个作业实际上是对模型的更改。例如，如果instances == 3您seconds = 2400无法找到将作业分为两天、每次 1200 秒的解决方案；求解器将始终安排 3 个分配，每个分配 800 秒。

因此，这个建议实际上是对模型的更改，您必须确定这是否可以接受。

通过放置更多的分配实例通常会有助于总体多样性，因此这种变化可能不会产生很大的实际后果。它还允许在一天安排 2/3 的作业，在另一天安排剩余的 1/3，因此它甚至增加了一些灵活性。

但就您的总体要求而言，这可能会也可能不会被接受。

在所有情况下，您都必须使用确切的数据来测试更改，看看它们是否真的会带来改进。

我希望这会有所帮助（并且这是一个现实世界的问题，而不是家庭作业，因为我确实花了几个小时进行调查......）。

归档时间：	4 年，2 月前
查看次数：	1721 次
最近记录：	2 年前