Ank*_*Ank 23 python regex string
我只想根据多个分隔符(如“and”、“&”和“-”)按顺序拆分字符串一次。例子:
'121 34 adsfd' -> ['121 34 adsfd']
'dsfsd and adfd' -> ['dsfsd ', ' adfd']
'dsfsd & adfd' -> ['dsfsd ', ' adfd']
'dsfsd - adfd' -> ['dsfsd ', ' adfd']
'dsfsd and adfd and adsfa' -> ['dsfsd ', ' adfd and adsfa']
'dsfsd and adfd - adsfa' -> ['dsfsd ', ' adfd - adsfa']
'dsfsd - adfd and adsfa' -> ['dsfsd - adfd ', ' adsfa']
Run Code Online (Sandbox Code Playgroud)
我尝试了下面的代码来实现这一点:
import re
re.split('and|&|-', string, maxsplit=1)
Run Code Online (Sandbox Code Playgroud)
它适用于除最后一种情况之外的所有情况。由于它不遵循层次结构,因此它返回最后一个:
'dsfsd - adfd and adsfa' -> ['dsfsd ', ' adfd and adsfa']
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点?
Pra*_*adi 35
这对于单个正则表达式是不切实际的。你可以让它与负后视一起工作,但每个额外的分隔符都会变得非常复杂。使用简单的旧str.split()
行和多行来做到这一点非常简单。您所要做的就是检查使用当前分隔符进行拆分是否会为您提供两个元素。如果是,那就是你的答案。如果没有,请转到下一个分隔符:
def split_new(inp, delims):
for d in delims:
result = inp.split(d, maxsplit=1)
if len(result) == 2: return result
return [inp] # If nothing worked, return the input
Run Code Online (Sandbox Code Playgroud)
要测试这个:
teststrs = ['121 34 adsfd' , 'dsfsd and adfd', 'dsfsd & adfd' , 'dsfsd - adfd' , 'dsfsd and adfd and adsfa' , 'dsfsd and adfd - adsfa' , 'dsfsd - adfd and adsfa' ]
for t in teststrs:
print(repr(t), '->', split_new(t, ['and', '&', '-']))
Run Code Online (Sandbox Code Playgroud)
产出
'121 34 adsfd' -> ['121 34 adsfd']
'dsfsd and adfd' -> ['dsfsd ', ' adfd']
'dsfsd & adfd' -> ['dsfsd ', ' adfd']
'dsfsd - adfd' -> ['dsfsd ', ' adfd']
'dsfsd and adfd and adsfa' -> ['dsfsd ', ' adfd and adsfa']
'dsfsd and adfd - adsfa' -> ['dsfsd ', ' adfd - adsfa']
'dsfsd - adfd and adsfa' -> ['dsfsd - adfd ', ' adsfa']
Run Code Online (Sandbox Code Playgroud)
And*_*ely 23
尝试:
import re
tests = [
["121 34 adsfd", ["121 34 adsfd"]],
["dsfsd and adfd", ["dsfsd ", " adfd"]],
["dsfsd & adfd", ["dsfsd ", " adfd"]],
["dsfsd - adfd", ["dsfsd ", " adfd"]],
["dsfsd and adfd and adsfa", ["dsfsd ", " adfd and adsfa"]],
["dsfsd and adfd - adsfa", ["dsfsd ", " adfd - adsfa"]],
["dsfsd - adfd and adsfa", ["dsfsd - adfd ", " adsfa"]],
]
for s, result in tests:
res = re.split(r"and|&(?!.*and)|-(?!.*and|.*&)", s, maxsplit=1)
print(res)
assert res == result
Run Code Online (Sandbox Code Playgroud)
印刷:
['121 34 adsfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd and adsfa']
['dsfsd ', ' adfd - adsfa']
['dsfsd - adfd ', ' adsfa']
Run Code Online (Sandbox Code Playgroud)
解释:
正则表达式and|&(?!.*and)|-(?!.*and|.*&)
使用 3 种替代方法。
and
总是匹配或:&
仅在没有and
前进时才匹配(使用否定前瞻(?! )
或:-
仅在没有and
或&
领先时匹配。我们在re.sub
-> 仅在第一场比赛中使用此模式。
您可以保留分隔符列表,按其值排序。然后,您可以结合re.split
使用re.findall
,仅使用后者生成的分隔符,这些分隔符在分割中最不有价值,根据以下排名ops
:
import re
def split_order(s):
r, ops = re.findall('(?<=\s)and(?=\s)|\&|\-', s), ['and', '&', '-']
m = -1 if not r else min([ops.index(i) for i in r])
a, *b = re.split('|'.join(l:=[i for i in r if ops.index(i) == m]), s)
return [s] if not l else ([a] if not b else [a, s[len(a)+len(l[0]):]])
vals = ['121 34 adsfd' , 'dsfsd and adfd', 'dsfsd & adfd' , 'dsfsd - adfd' , 'dsfsd and adfd and adsfa' , 'dsfsd and adfd - adsfa' , 'dsfsd - adfd and adsfa' ]
for i in vals:
print(split_order(i))
Run Code Online (Sandbox Code Playgroud)
输出:
['121 34 adsfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd and adsfa']
['dsfsd ', ' adfd - adsfa']
['dsfsd - adfd ', ' adsfa']
Run Code Online (Sandbox Code Playgroud)