如何在python字符串中找到子串之间的子串?

vai*_*avk 1 python string bioinformatics

让字符串成为"AAAGQWERTYUIOPAGCTHJKLAAAGZXCVBNMAGCT".我想找到AAAG和AGCT之间的字符串.

我想输出["QWERTYUIOP","ZXCVBNM"],即字符串列表.

我如何使用正则表达式或类似的技术来做到这一点?

我试过这个

def find_distances_between_motifs(positions1, positions2, motif_length1):
diff1 = []
diff2 = []
pos2 = 0
flag = 0
for pos1 in range(len(positions1)):
    if pos2 >= len(positions2):
        break
    if flag == 1:
        flag = 0
        pos1 -= 1
    if positions2[pos2] - positions1[pos1] > 30:
        diff1.append(NaN)
        diff2.append(NaN)
        continue
    elif positions2[pos2] - positions1[pos1] < 1:
        pos2 += 1
        diff2.append(NaN)
        flag = 1
    elif pos1 < len(positions1) - 1 and positions1[pos1+1] > positions2[pos2]:
        diff1.append(positions[pos2] - positions[pos1] - motif_length1)
        diff2.append(pos2)
        pos2 += 1
    else:
        continue
return diff1, diff2
Run Code Online (Sandbox Code Playgroud)

我想返回两个数组 - 一个在主题之间具有序列长度的位置,第二个具有将给出先前距离的第二主题的位置.

Rak*_*esh 7

使用正则表达式. re.findall与Lookbehind和Lookahead

例如:

import re
s = "AAAGQWERTYUIOPAGCTHJKLAAAGZXCVBNMAGCT"
print( re.findall(r"(?<=AAAG).*?(?=AGCT)", s))
Run Code Online (Sandbox Code Playgroud)

输出:

['QWERTYUIOP', 'ZXCVBNM']
Run Code Online (Sandbox Code Playgroud)