如何有效地计算字符串中字符频率的前缀和？

Question

如何有效地计算字符串中字符频率的前缀和？

说，我有绳子

s = 'AAABBBCAB'

Run Code Online (Sandbox Code Playgroud)

如何有效地计算字符串中每个字符的频率前缀总和，即：

psum = [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, {'A': 4, 'B': 4, 'C': 1}]

Run Code Online (Sandbox Code Playgroud)

Answer 1

Eug*_*ash 20

您可以使用itertools.accumulate和在一行中完成操作 collections.Counter：

from collections import Counter
from itertools import accumulate

s = 'AAABBBCAB'
psum = list(accumulate(map(Counter, s)))

Run Code Online (Sandbox Code Playgroud)

这为您提供了Counter对象列表。现在，要获得sO（1）时间内任何子串的频率，您可以简单地减去计数器，例如：

>>> psum[6] - psum[1]  # get frequencies for s[2:7]
Counter({'B': 3, 'A': 1, 'C': 1})

Run Code Online (Sandbox Code Playgroud)

Answer 2

hir*_*ist 19

这是一个选择：

from collections import Counter

c = Counter()
s = 'AAABBBCAB'

psum = []
for char in s:
    c.update(char)
    psum.append(dict(c))

# [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, 
#  {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
#  {'A': 4, 'B': 4, 'C': 1}]

Run Code Online (Sandbox Code Playgroud)

我使用collections.Counter以保持“运行总和”并将其添加（结果的副本）到列表中psum。这样，我只对字符串进行一次迭代s。

如果您希望collections.Counter结果中包含对象，则可以将最后一行更改为

psum.append(c.copy())

Run Code Online (Sandbox Code Playgroud)

为了得到

[Counter({'A': 1}), Counter({'A': 2}), ...
 Counter({'A': 4, 'B': 4, 'C': 1})]

Run Code Online (Sandbox Code Playgroud)

使用此方法也可以实现相同的结果（使用accumulate是在Eugene Yarmash的答案中首次提出的；我只是避免map使用生成器表达式）：

from itertools import accumulate
from collections import Counter

s = "AAABBBCAB"
psum = list(accumulate(Counter(char) for char in s))

Run Code Online (Sandbox Code Playgroud)

仅出于完整性考虑（因为此处尚无“纯dict”答案）。如果您不想使用Counter，defaultdict也可以使用它：

c = {}
s = 'AAABBBCAB'

psum = []
for char in s:
    c[char] = c.get(char, 0) + 1
    psum.append(c.copy())

Run Code Online (Sandbox Code Playgroud)

尽管defaultdict通常比dict.get(key, default)。

@DeveshKumarSingh：您的答案晚于此，它是完全相同的结构，但类型略有不同，具有相同的复杂性，但输出更为冗长。您不应该在这里做广告。 (5认同)
我们在这里甚至不需要`Counter`，一个简单的`defaultdict`就能完成@ hiro-protagonist，在下面检查我的答案！ (2认同)
是什么让您说“ defaultdict”比“ Counter”更简单？用什么方式更简单？ (2认同)
@DeveshKumarSingh它们都是dict的子类；计数器的数据结构并不比dict的复杂。还是我想念什么？ (2认同)
@DeveshKumarSingh，此注意事项放错了位置。我已经指出了时间性能的差异，但是OP应该自行决定。 (2认同)

Answer 3

Chr*_*per 6

最简单的方法是使用集合中的Counter对象。

from collections import Counter

s = 'AAABBBCAB'

[ dict(Counter(s[:i]) for i in range(1,len(s))]

Run Code Online (Sandbox Code Playgroud)

产量：

[{'A': 1},  {'A': 2},  {'A': 3},  {'A': 3, 'B': 1},  {'A': 3, 'B': 2},
{'A': 3, 'B': 3},  {'A': 3, 'B': 3, 'C': 1},  {'A': 4, 'B': 3, 'C': 1}]

Run Code Online (Sandbox Code Playgroud)

这是一个优雅的1-liner，所以+1，但是是二次而不是线性的。我怀疑弘主角的类似解决方案更有效。 (5认同)

Answer 4

Dev*_*ngh 6

您实际上甚至不需要计数器，只需一个defaultdict就足够了！

from collections import defaultdict

c = defaultdict(int)
s = 'AAABBBCAB'

psum = []

#iterate through the character
for char in s:
    #Update count for each character
    c[char] +=1
    #Add the updated dictionary to the output list
    psum.append(dict(c))

print(psum)

Run Code Online (Sandbox Code Playgroud)

输出看起来像

[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, 
{'A': 3, 'B': 2}, {'A': 3, 'B': 3}, 
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, 
{'A': 4, 'B': 4, 'C': 1}]

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，6 月前
查看次数：	1474 次
最近记录：	6 年，1 月前