用于查找具有其他字符串的所有字符的子字符串的最小长度的算法

ins*_*nce 6 algorithm

我有两个字符串:
string1 - hello how are you,
String2 - olo(包括空格字符)

输出:lo ho (HEL LO浩 w为你)

lo ho是包含string2所有字符的唯一子字符串.任何人都可以为此建议一个好的算法(我可以认为og只有Brute Force算法 - O(n ^ 2).

输出也应该是最小长度字符串(如果有多个选项).

Nik*_* B. 6

保留两个指针lr一个哈希表,M = character -> count用于string2中没有出现的字符s[l..r].

最初设置l = 0,r以便string1[l..r]包含所有字符string2(如果可能).您可以通过从M中删除字符直到它为空来完成此操作.

然后r在每个步骤中递增1,然后l尽可能地递增,同时仍保持M为空.所有r - l + 1(子串的长度s[l..r])的最小值是解决方案.

Pythonish伪代码:

n = len(string1)
M = {}   # let's say M is empty if it contains no positive values
for c in string2:
    M[c]++
l = 0
r = -1
while r + 1 < n and M not empty:
    r++
    M[string1[r]]--
if M not empty: 
    return "no solution"
answer_l, answer_r = l, r
while True:
    while M[string1[l]] < 0:
        M[string1[l]]++
        l++
    if r - l + 1 < answer_r - anwer_l + 1:
        answer_l, answer_r = l, r
    r++
    if r == n:
        break
    M[string1[r]]--
return s[answer_l..answer_r]
Run Code Online (Sandbox Code Playgroud)

如果在执行递增和递减操作时保持正条目数,则可以在O(1)中实现"空"检查.

n长度string1m长度string2.

注意,l并且r只是递增,因此最多有O(n)个增量,因此在最后一个外部循环中最多执行O(n)个指令.

如果M实现为数组(我假设字母表是常量大小),则获得运行时O(n + m),这是最佳的.如果字母表太大,您可以使用哈希表来获得预期的O(n + m).

执行示例:

string1 = "abbabcdbcb"
string2 = "cbb"

# after first loop
M = { 'a': 0, 'b': 2, 'c': 1, 'd': 0 }

# after second loop
l = 0
r = 5
M = { 'a': -2, 'b': -1, 'c': 0, 'd': 0 }

# increment l as much as possible:
l = 2
r = 5
M = { 'a': -1, 'b': 0, 'c': 0, 'd': 0 }

# increment r by one and then l as much as possible
l = 2
r = 6
M = { 'a': -1, 'b': 0, 'c': 0, 'd': -1 }

# increment r by one and then l as much as possible
l = 4
r = 7
M = { 'a': 0, 'b': 0, 'c': 0, 'd': -1 }

# increment r by one and then l as much as possible
l = 4
r = 8
M = { 'a': 0, 'b': 0, 'c': -1, 'd': -1 }

# increment r by one and then l as much as possible
l = 7
r = 9
M = { 'a': 0, 'b': 0, 'c': 0, 'd': 0 }
Run Code Online (Sandbox Code Playgroud)

最好的解决方案是s [7..9].