Nor*_*ide 54 python string algorithm dynamic-programming time-complexity
我想比较2个字符串并保持匹配,在比较失败的地方分开.
所以,如果我有2个字符串 -
string1 = apples
string2 = appleses
answer = apples
另一个例子,因为字符串可能有多个单词.
string1 = apple pie available
string2 = apple pies
answer = apple pie
我确信有一种简单的Python方法可以做到这一点,但我无法解决,任何帮助和解释都表示赞赏.
Ric*_*ren 136
为了完整起见,difflib在标准库中提供了大量序列比较实用程序.例如find_longest_match,当在字符串上使用时,它找到最长的公共子字符串.使用示例:
from difflib import SequenceMatcher
string1 = "apple pie available"
string2 = "come have some apple pies"
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
print(match)  # -> Match(a=0, b=15, size=9)
print(string1[match.a: match.a + match.size])  # -> apple pie
print(string2[match.b: match.b + match.size])  # -> apple pie
Eri*_*ric 34
def common_start(sa, sb):
    """ returns the longest common substring from the beginning of sa and sb """
    def _iter():
        for a, b in zip(sa, sb):
            if a == b:
                yield a
            else:
                return
    return ''.join(_iter())
>>> common_start("apple pie available", "apple pies")
'apple pie'
或者有点奇怪的方式:
def stop_iter():
    """An easy way to break out of a generator"""
    raise StopIteration
def common_start(sa, sb):
    return ''.join(a if a == b else stop_iter() for a, b in zip(sa, sb))
哪个可能更具可读性
def terminating(cond):
    """An easy way to break out of a generator"""
    if cond:
        return True
    raise StopIteration
def common_start(sa, sb):
    return ''.join(a for a, b in zip(sa, sb) if terminating(a == b))
the*_*eye 15
它被称为最长公共子串问题.在这里,我提出了一个简单易懂但效率低下的解决方案.由于该算法的复杂度为O(N ^ 2),因此需要很长时间才能为大字符串生成正确的输出.
def longestSubstringFinder(string1, string2):
    answer = ""
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)): answer = match
                match = ""
    return answer
print longestSubstringFinder("apple pie available", "apple pies")
print longestSubstringFinder("apples", "appleses")
print longestSubstringFinder("bapples", "cappleses")
产量
apple pie
apples
apples
jon*_*nas 12
人们也可以考虑os.path.commonprefix对字符起作用,因此可以用于任何字符串.
import os
common = os.path.commonprefix(['apple pie available', 'apple pies'])
assert common == 'apple pie'
与Evo相同,但要比较任意数量的字符串:
def common_start(*strings):
    """ Returns the longest common substring
        from the beginning of the `strings`
    """
    def _iter():
        for z in zip(*strings):
            if z.count(z[0]) == len(z):  # check all elements in `z` are the same
                yield z[0]
            else:
                return
    return ''.join(_iter())
小智 5
使用第一个答案修复错误:
def longestSubstringFinder(string1, string2):
    answer = ""
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        for j in range(len2):
            lcs_temp=0
            match=''
            while ((i+lcs_temp < len1) and (j+lcs_temp<len2) and string1[i+lcs_temp] == string2[j+lcs_temp]):
                match += string2[j+lcs_temp]
                lcs_temp+=1
            if (len(match) > len(answer)):
                answer = match
    return answer
print longestSubstringFinder("dd apple pie available", "apple pies")
print longestSubstringFinder("cov_basic_as_cov_x_gt_y_rna_genes_w1000000", "cov_rna15pcs_as_cov_x_gt_y_rna_genes_w1000000")
print longestSubstringFinder("bapples", "cappleses")
print longestSubstringFinder("apples", "apples")
我发现最快的方法是使用suffix_trees包:
from suffix_trees import STree
a = ["xxxabcxxx", "adsaabc"]
st = STree.STree(a)
print(st.lcs()) # "abc"
| 归档时间: | 
 | 
| 查看次数: | 67478 次 | 
| 最近记录: |