是否有一种高性能算法可以将 Traces 中不必要的大循环减少到最低限度？

Dia*_*ana 6 python algorithm performance trace

目前，我正在研究一种算法，以将跟踪中的循环迭代次数减少到所需的最低限度。

更准确地说，是这样的：

trace = ["a", "b", "b", "b", "c"]

Run Code Online (Sandbox Code Playgroud)

应减少为：

new_trace = ["a", "b", "b", "c"]

Run Code Online (Sandbox Code Playgroud)

这样“b”后面的“b”的信息就不会丢失。

另一个例子是：

trace = ["a", "b", "c", "d", "b", "c", "d", "b", "c", "d", "e"]

Run Code Online (Sandbox Code Playgroud)

应减少为：

new_trace = ["a", "b", "c", "d", "b", "c", "d", "e"]

Run Code Online (Sandbox Code Playgroud)

我还在 Python 中实现了一个算法，它的作用正是如此：

def reduce_cycles(trace_variant: List[str], loop_retain_factor: int) -> List[str]:
    original_edges = list(zip(trace_variant, trace_variant[1:]))
    trace_follows_graph = Counter(original_edges)

    trace_graph = nx.DiGraph()
    trace_graph.add_edges_from(original_edges)
    trace_cycles: Generator[List[str], None, None] = nx.simple_cycles(trace_graph)

    edges_per_cycle = map(
        lambda cycle: list(zip(cycle, cycle[1:] + cycle[0:1])), trace_cycles
    )

    # reduce number of edges for cycles in trace
    for cycle_edges in edges_per_cycle:
        reduce_cycle_weight = min(itemgetter(*cycle_edges)(trace_follows_graph))

        for cycle_edge in cycle_edges:
            trace_follows_graph[cycle_edge] -= reduce_cycle_weight - loop_retain_factor

    # does not exactly work
    # especially not for examples such as trace_variant = ["a", "b", "c", "b", "c", "d", "c", "b", "e"]
    def replay_edge(acc: List[Edge], new: Edge):
        if trace_follows_graph[new] > 0:
            trace_follows_graph[new] -= 1
            return acc + [new]
        return acc

    new_trace_edges = reduce(replay_edge, original_edges, [])
    new_trace_variant = list(map(itemgetter(0), new_trace_edges))

    # add final event
    new_trace_variant.append(new_trace_edges[-1][-1])
    return new_trace_variant

Run Code Online (Sandbox Code Playgroud)

但正如您可以明显看到的，该算法并不完全有效。特别是因为通常存在大量跟踪，这意味着将为每个跟踪调用此函数。所以我想知道是否已经存在一种可以更有效地解决这个问题的算法，或者是否有其他人知道如何提高这个实现性能。

我将不胜感激你的帮助！

编辑：bcbcbc dcbe 应该变成bc dcbe