在Python中对字符串前缀执行二进制搜索

dln*_*385 4 python arrays search

我想搜索以给定子字符串开头的所有元素的排序字符串列表.

这是一个查找所有完全匹配的示例:

import bisect
names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
names.sort()
leftIndex = bisect.bisect_left(names, 'bob')
rightIndex = bisect.bisect_right(names, 'bob')
print(names[leftIndex:rightIndex])
Run Code Online (Sandbox Code Playgroud)

哪个打印['bob', 'bob', 'bob'].

相反,我想搜索所有以'bob' 开头的名字.我想要的输出是['bob', 'bob', 'bob', 'bobby', 'bobert'].如果我可以修改bisect搜索的比较方法,那么我可以用name.startswith('bob')它来做.

举个例子,在Java中它很容易.我会用:

Arrays.binarySearch(names, "bob", myCustomComparator);
Run Code Online (Sandbox Code Playgroud)

其中'myCustomComparator'是一个利用startswith方法(和一些额外的逻辑)的比较器.

我如何在Python中执行此操作?

Sin*_*ion 6

bisect 可以通过使用使用选择的自定义比较器的实例来使用自定义比较来欺骗:

>>> class PrefixCompares(object):
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         return self.value < other[0:len(self.value)]
... 
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> key = PrefixCompares('bob')
>>> leftIndex = bisect.bisect_left(names, key)
>>> rightIndex = bisect.bisect_right(names, key)
>>> print(names[leftIndex:rightIndex])
['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert']
>>> 
Run Code Online (Sandbox Code Playgroud)

DOH.正确的平分工作,但左边显然没有."adam"没有以"bob"为前缀!要修复它,你也必须调整顺序.

>>> class HasPrefix(object):
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         return self.value[0:len(other.value)] < other.value
... 
>>> class Prefix(object):
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         return self.value < other.value[0:len(self.value)]
... 
>>> class AdaptPrefix(object):
...     def __init__(self, seq):
...         self.seq = seq
...     def __getitem__(self, key):
...         return HasPrefix(self.seq[key])
...     def __len__(self):
...         return len(self.seq)
... 
>>> import bisect
>>> names = ['adam', 'bob', 'bob', 'bob', 'bobby', 'bobert', 'chris']
>>> names.sort()
>>> needle = Prefix('bob')
>>> haystack = AdaptPrefix(names)
>>> leftIndex = bisect.bisect_left(haystack, needle)
>>> rightIndex = bisect.bisect_right(haystack, needle)
>>> print(names[leftIndex:rightIndex])
['bob', 'bob', 'bob', 'bobby', 'bobert']
>>> 
Run Code Online (Sandbox Code Playgroud)