Dia*_*ana 6 python algorithm performance trace
目前,我正在研究一种算法,以将跟踪中的循环迭代次数减少到所需的最低限度。
更准确地说,是这样的:
trace = ["a", "b", "b", "b", "c"]
Run Code Online (Sandbox Code Playgroud)
应减少为:
new_trace = ["a", "b", "b", "c"]
Run Code Online (Sandbox Code Playgroud)
这样“b”后面的“b”的信息就不会丢失。
另一个例子是:
trace = ["a", "b", "c", "d", "b", "c", "d", "b", "c", "d", "e"]
Run Code Online (Sandbox Code Playgroud)
应减少为:
new_trace = ["a", "b", "c", "d", "b", "c", "d", "e"]
Run Code Online (Sandbox Code Playgroud)
我还在 Python 中实现了一个算法,它的作用正是如此:
def reduce_cycles(trace_variant: List[str], loop_retain_factor: int) -> List[str]:
original_edges = list(zip(trace_variant, trace_variant[1:]))
trace_follows_graph = Counter(original_edges)
trace_graph = nx.DiGraph()
trace_graph.add_edges_from(original_edges)
trace_cycles: Generator[List[str], None, None] = nx.simple_cycles(trace_graph)
edges_per_cycle = map(
lambda cycle: list(zip(cycle, cycle[1:] + cycle[0:1])), trace_cycles
)
# reduce number of edges for cycles in trace
for cycle_edges in edges_per_cycle:
reduce_cycle_weight = min(itemgetter(*cycle_edges)(trace_follows_graph))
for cycle_edge in cycle_edges:
trace_follows_graph[cycle_edge] -= reduce_cycle_weight - loop_retain_factor
# does not exactly work
# especially not for examples such as trace_variant = ["a", "b", "c", "b", "c", "d", "c", "b", "e"]
def replay_edge(acc: List[Edge], new: Edge):
if trace_follows_graph[new] > 0:
trace_follows_graph[new] -= 1
return acc + [new]
return acc
new_trace_edges = reduce(replay_edge, original_edges, [])
new_trace_variant = list(map(itemgetter(0), new_trace_edges))
# add final event
new_trace_variant.append(new_trace_edges[-1][-1])
return new_trace_variant
Run Code Online (Sandbox Code Playgroud)
但正如您可以明显看到的,该算法并不完全有效。特别是因为通常存在大量跟踪,这意味着将为每个跟踪调用此函数。所以我想知道是否已经存在一种可以更有效地解决这个问题的算法,或者是否有其他人知道如何提高这个实现性能。
我将不胜感激你的帮助!
或者,如果您从以下跟踪构造有向图: abcdbcdbcdbcdbcde

编辑2:我已经更新了代码、描述和一张图表,因为@kcsquared 的评论让我重新评估了这个问题。
总而言之,我想要的是,当将跟踪传递给函数/算法时,它应该删除所有重复,除非删除重复会删除相应有向图中的边。
再次感谢任何帮助。
| 归档时间: |
|
| 查看次数: |
113 次 |
| 最近记录: |