Nor*_*ide 54 python string algorithm dynamic-programming time-complexity
我想比较2个字符串并保持匹配,在比较失败的地方分开.
所以,如果我有2个字符串 -
string1 = apples
string2 = appleses
answer = apples
Run Code Online (Sandbox Code Playgroud)
另一个例子,因为字符串可能有多个单词.
string1 = apple pie available
string2 = apple pies
answer = apple pie
Run Code Online (Sandbox Code Playgroud)
我确信有一种简单的Python方法可以做到这一点,但我无法解决,任何帮助和解释都表示赞赏.
Ric*_*ren 136
为了完整起见,difflib
在标准库中提供了大量序列比较实用程序.例如find_longest_match
,当在字符串上使用时,它找到最长的公共子字符串.使用示例:
from difflib import SequenceMatcher
string1 = "apple pie available"
string2 = "come have some apple pies"
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
print(match) # -> Match(a=0, b=15, size=9)
print(string1[match.a: match.a + match.size]) # -> apple pie
print(string2[match.b: match.b + match.size]) # -> apple pie
Run Code Online (Sandbox Code Playgroud)
Eri*_*ric 34
def common_start(sa, sb):
""" returns the longest common substring from the beginning of sa and sb """
def _iter():
for a, b in zip(sa, sb):
if a == b:
yield a
else:
return
return ''.join(_iter())
Run Code Online (Sandbox Code Playgroud)
>>> common_start("apple pie available", "apple pies")
'apple pie'
Run Code Online (Sandbox Code Playgroud)
或者有点奇怪的方式:
def stop_iter():
"""An easy way to break out of a generator"""
raise StopIteration
def common_start(sa, sb):
return ''.join(a if a == b else stop_iter() for a, b in zip(sa, sb))
Run Code Online (Sandbox Code Playgroud)
哪个可能更具可读性
def terminating(cond):
"""An easy way to break out of a generator"""
if cond:
return True
raise StopIteration
def common_start(sa, sb):
return ''.join(a for a, b in zip(sa, sb) if terminating(a == b))
Run Code Online (Sandbox Code Playgroud)
the*_*eye 15
它被称为最长公共子串问题.在这里,我提出了一个简单易懂但效率低下的解决方案.由于该算法的复杂度为O(N ^ 2),因此需要很长时间才能为大字符串生成正确的输出.
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return answer
print longestSubstringFinder("apple pie available", "apple pies")
print longestSubstringFinder("apples", "appleses")
print longestSubstringFinder("bapples", "cappleses")
Run Code Online (Sandbox Code Playgroud)
产量
apple pie
apples
apples
Run Code Online (Sandbox Code Playgroud)
jon*_*nas 12
人们也可以考虑os.path.commonprefix
对字符起作用,因此可以用于任何字符串.
import os
common = os.path.commonprefix(['apple pie available', 'apple pies'])
assert common == 'apple pie'
Run Code Online (Sandbox Code Playgroud)
与Evo相同,但要比较任意数量的字符串:
def common_start(*strings):
""" Returns the longest common substring
from the beginning of the `strings`
"""
def _iter():
for z in zip(*strings):
if z.count(z[0]) == len(z): # check all elements in `z` are the same
yield z[0]
else:
return
return ''.join(_iter())
Run Code Online (Sandbox Code Playgroud)
小智 5
使用第一个答案修复错误:
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
for j in range(len2):
lcs_temp=0
match=''
while ((i+lcs_temp < len1) and (j+lcs_temp<len2) and string1[i+lcs_temp] == string2[j+lcs_temp]):
match += string2[j+lcs_temp]
lcs_temp+=1
if (len(match) > len(answer)):
answer = match
return answer
print longestSubstringFinder("dd apple pie available", "apple pies")
print longestSubstringFinder("cov_basic_as_cov_x_gt_y_rna_genes_w1000000", "cov_rna15pcs_as_cov_x_gt_y_rna_genes_w1000000")
print longestSubstringFinder("bapples", "cappleses")
print longestSubstringFinder("apples", "apples")
Run Code Online (Sandbox Code Playgroud)
我发现最快的方法是使用suffix_trees
包:
from suffix_trees import STree
a = ["xxxabcxxx", "adsaabc"]
st = STree.STree(a)
print(st.lcs()) # "abc"
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
67478 次 |
最近记录: |