Python:每三个单词拆分字符串

Lie*_*rta 2 python regex python-3.x

我一直在寻找一段时间,但我似乎找不到这个小问题的答案.

我有这个代码,应该在每三个单词之后拆分字符串:

import re

def splitTextToTriplet(Text):
    x = re.split('^((?:\S+\s+){2}\S+).*',Text)
    return x


print(splitTextToTriplet("Do you know how to sing"))
Run Code Online (Sandbox Code Playgroud)

目前输出如下:

['', 'Do you know', '']
Run Code Online (Sandbox Code Playgroud)

但我实际上期待这个输出:

['Do you know', 'how to sing'] 
Run Code Online (Sandbox Code Playgroud)

如果我打印(splitTextToTriplet("你知道怎么做")),它还应该输出:

['Do you know', 'how to'] 
Run Code Online (Sandbox Code Playgroud)

如何更改正则表达式以产生预期的输出?

Oli*_*çon 6

我认为re.split可能不是最佳方法,因为后视不能采用可变长度模式.

相反,您可以使用str.split然后将单词连接在一起.

def splitTextToTriplet(string):
    words = string.split()
    grouped_words = [' '.join(words[i: i + 3]) for i in range(0, len(words), 3)]
    return grouped_words

splitTextToTriplet("Do you know how to sing")
# ['Do you know', 'how to sing']

splitTextToTriplet("Do you know how to")
# ['Do you know', 'how to'] 
Run Code Online (Sandbox Code Playgroud)

虽然建议使用此解决方案,如果您的某些空白区域是换行符,则该信息将在此过程中丢失.


Tig*_*.ru 6

我用于re.findall您期望的输出。为了获得更通用的分割函数,我将splitTextToTripleton替换splitTextonWordsnumberOfWords参数:

import re

def splitTextonWords(Text, numberOfWords=1):
    if (numberOfWords > 1):
        text = Text.lstrip()
        pattern = '(?:\S+\s*){1,'+str(numberOfWords-1)+'}\S+(?!=\s*)'
        x =re.findall(pattern,text)
    elif (numberOfWords == 1):
        x = Text.split()
    else: 
        x = None
    return x

print(splitTextonWords("Do you know how to sing", 3))
print(splitTextonWords("Do you know how to", 3))
print(splitTextonWords("Do you know how to sing how to dance how to", 3))
print(splitTextonWords("A sentence this code will fail at", 3))
print(splitTextonWords("A sentence this             code will fail at ", 3))
print(splitTextonWords("   A sentence this code will fail at s", 3))
print(splitTextonWords("   A sentence this code will fail at s", 4))
print(splitTextonWords("   A sentence this code will fail at s", 2))
print(splitTextonWords("   A sentence this code will fail at s", 1))
print(splitTextonWords("   A sentence this code will fail at s", 0))
Run Code Online (Sandbox Code Playgroud)

输出:

['你知道吗', '怎么唱歌']
['你知道吗', '怎么做']
['你知道吗', '怎么唱歌', '怎么跳舞', '怎么做']
[ '这句话', '代码将失败', 'at']
['这句话', '代码将失败', 'at']
['这句话', '代码将失败', 'at s' ]
['一个句子这个代码', '将在 s 处失败']
['一个句子', '这个代码', '将失败', '在 s']
['A', '句子', '这个', 'code', 'will', 'fail', 'at', 's']