这是解释这个问题的最简单方法.这是我正在使用的:
re.split('\W', 'foo/bar spam\neggs')
-> ['foo', 'bar', 'spam', 'eggs']
Run Code Online (Sandbox Code Playgroud)
这就是我想要的:
someMethod('\W', 'foo/bar spam\neggs')
-> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
Run Code Online (Sandbox Code Playgroud)
原因是我想将一个字符串拆分成标记,操纵它,然后再将它重新组合在一起.
Com*_*ger 253
>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
Run Code Online (Sandbox Code Playgroud)
Mar*_*ato 23
如果要拆分换行符,请使用splitlines(True).
>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']
Run Code Online (Sandbox Code Playgroud)
(不是一般的解决方案,但是如果有人来这里没有意识到这种方法存在的话,可以在这里添加它.)
oot*_*wch 10
另一个在Python 3上运行良好的非正则表达式解决方案
# Split strings and keep separator
test_strings = ['<Hello>', 'Hi', '<Hi> <Planet>', '<', '']
def split_and_keep(s, sep):
if not s: return [''] # consistent with string.split()
# Find replacement character that is not used in string
# i.e. just use the highest available character plus one
# Note: This fails if ord(max(s)) = 0x10FFFF (ValueError)
p=chr(ord(max(s))+1)
return s.replace(sep, sep+p).split(p)
for s in test_strings:
print(split_and_keep(s, '<'))
# If the unicode limit is reached it will fail explicitly
unicode_max_char = chr(1114111)
ridiculous_string = '<Hello>'+unicode_max_char+'<World>'
print(split_and_keep(ridiculous_string, '<'))
Run Code Online (Sandbox Code Playgroud)
如果您只有一个分隔符,则可以使用列表推导:
text = 'foo,bar,baz,qux'
sep = ','
Run Code Online (Sandbox Code Playgroud)
附加/预先分隔符:
result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']
result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']
Run Code Online (Sandbox Code Playgroud)
分隔符作为它自己的元素:
result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1] # to get rid of trailing
Run Code Online (Sandbox Code Playgroud)
另一个示例,拆分非字母数字并保留分隔符
import re
a = "foo,bar@candy*ice%cream"
re.split('([^a-zA-Z0-9])',a)
Run Code Online (Sandbox Code Playgroud)
输出:
['foo', ',', 'bar', '@', 'candy', '*', 'ice', '%', 'cream']
Run Code Online (Sandbox Code Playgroud)
说明
re.split('([^a-zA-Z0-9])',a)
() <- keep the separators
[] <- match everything in between
^a-zA-Z0-9 <-except alphabets, upper/lower and numbers.
Run Code Online (Sandbox Code Playgroud)
小智 7
假设您的正则表达式模式是split_pattern = r'(!|\?)'
首先,添加一些与新分隔符相同的字符,例如“[cut]”
new_string = re.sub(split_pattern, '\\1[cut]', your_string)
然后拆分新的分隔符new_string.split('[cut]')。
这是一个无需正则表达式即可工作的简单.split解决方案。
这是Python split() 的答案,没有删除分隔符,所以不完全是原始帖子所要求的,但另一个问题作为这个问题的重复项被关闭。
def splitkeep(s, delimiter):
split = s.split(delimiter)
return [substr + delimiter for substr in split[:-1]] + [split[-1]]
Run Code Online (Sandbox Code Playgroud)
随机测试:
import random
CHARS = [".", "a", "b", "c"]
assert splitkeep("", "X") == [""] # 0 length test
for delimiter in ('.', '..'):
for _ in range(100000):
length = random.randint(1, 50)
s = "".join(random.choice(CHARS) for _ in range(length))
assert "".join(splitkeep(s, delimiter)) == s
Run Code Online (Sandbox Code Playgroud)
您还可以使用字符串数组而不是正则表达式来分割字符串,如下所示:
def tokenizeString(aString, separators):
#separators is an array of strings that are being used to split the string.
#sort separators in order of descending length
separators.sort(key=len)
listToReturn = []
i = 0
while i < len(aString):
theSeparator = ""
for current in separators:
if current == aString[i:i+len(current)]:
theSeparator = current
if theSeparator != "":
listToReturn += [theSeparator]
i = i + len(theSeparator)
else:
if listToReturn == []:
listToReturn = [""]
if(listToReturn[-1] in separators):
listToReturn += [""]
listToReturn[-1] += aString[i]
i += 1
return listToReturn
print(tokenizeString(aString = "\"\"\"hi\"\"\" hello + world += (1*2+3/5) '''hi'''", separators = ["'''", '+=', '+', "/", "*", "\\'", '\\"', "-=", "-", " ", '"""', "(", ")"]))
Run Code Online (Sandbox Code Playgroud)