如果我有一个字母列表,例如:
word = ['W','I','N','E']
并且需要获取长度为 3 或更短的所有可能的子字符串序列,例如:
W I N E, WI N E, WI NE, W IN E, WIN E等
。解决此问题的最有效方法是什么?
现在,我有:
word = ['W','I','N','E']
for idx,phon in enumerate(word):
phon_seq = ""
for p_len in range(3):
if idx-p_len >= 0:
phon_seq = " ".join(word[idx-(p_len):idx+1])
print(phon_seq)
Run Code Online (Sandbox Code Playgroud)
这只是给了我以下内容,而不是子序列:
W
I
W I
N
I N
W I N
E
N E
I N E
Run Code Online (Sandbox Code Playgroud)
我只是不知道如何创建每个可能的序列。
尝试这个递归算法:
def segment(word):
def sub(w):
if len(w) == 0:
yield []
for i in xrange(1, min(4, len(w) + 1)):
for s in sub(w[i:]):
yield [''.join(w[:i])] + s
return list(sub(word))
# And if you want a list of strings:
def str_segment(word):
return [' '.join(w) for w in segment(word)]
Run Code Online (Sandbox Code Playgroud)
输出:
>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]
>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']
Run Code Online (Sandbox Code Playgroud)