有没有办法分割字符串而不拆分转义字符?例如,我有一个字符串,想要用':'而不是'\:'来拆分
http\://www.example.url:ftp\://www.example.url
Run Code Online (Sandbox Code Playgroud)
结果应如下:
['http\://www.example.url' , 'ftp\://www.example.url']
Run Code Online (Sandbox Code Playgroud)
小智 27
使用带有负向lookbehind断言的正则表达式有一种更简单的方法:
re.split(r'(?<!\\):', str)
Run Code Online (Sandbox Code Playgroud)
正如伊格纳西奥说的那样,是的,但并非一蹴而就.问题是您需要回顾以确定您是否处于转义分隔符,并且基本string.split不提供该功能.
如果这不在紧密循环中,那么性能不是一个重要问题,您可以先通过拆分转义分隔符,然后执行拆分,然后合并来完成.丑陋的演示代码如下:
# Bear in mind this is not rigorously tested!
def escaped_split(s, delim):
# split by escaped, then by not-escaped
escaped_delim = '\\'+delim
sections = [p.split(delim) for p in s.split(escaped_delim)]
ret = []
prev = None
for parts in sections: # for each list of "real" splits
if prev is None:
if len(parts) > 1:
# Add first item, unless it's also the last in its section
ret.append(parts[0])
else:
# Add the previous last item joined to the first item
ret.append(escaped_delim.join([prev, parts[0]]))
for part in parts[1:-1]:
# Add all the items in the middle
ret.append(part)
prev = parts[-1]
return ret
s = r'http\://www.example.url:ftp\://www.example.url'
print (escaped_split(s, ':'))
# >>> ['http\\://www.example.url', 'ftp\\://www.example.url']
Run Code Online (Sandbox Code Playgroud)
或者,如果您只是手动拆分字符串,则可能更容易遵循逻辑.
def escaped_split(s, delim):
ret = []
current = []
itr = iter(s)
for ch in itr:
if ch == '\\':
try:
# skip the next character; it has been escaped!
current.append('\\')
current.append(next(itr))
except StopIteration:
pass
elif ch == delim:
# split! (add current to the list and reset it)
ret.append(''.join(current))
current = []
else:
current.append(ch)
ret.append(''.join(current))
return ret
Run Code Online (Sandbox Code Playgroud)
请注意,当遇到双转义后跟分隔符时,第二个版本的行为略有不同:此函数允许转义转义字符,因此escaped_split(r'a\\:b', ':')返回['a\\\\', 'b'],因为第一个\转义第二个版本,使得:被解释为真正的分隔符.所以这是值得注意的.
与 Python3 兼容的 Henry 答案的编辑版本,测试并修复了一些问题:
def split_unescape(s, delim, escape='\\', unescape=True):
"""
>>> split_unescape('foo,bar', ',')
['foo', 'bar']
>>> split_unescape('foo$,bar', ',', '$')
['foo,bar']
>>> split_unescape('foo$$,bar', ',', '$', unescape=True)
['foo$', 'bar']
>>> split_unescape('foo$$,bar', ',', '$', unescape=False)
['foo$$', 'bar']
>>> split_unescape('foo$', ',', '$', unescape=True)
['foo$']
"""
ret = []
current = []
itr = iter(s)
for ch in itr:
if ch == escape:
try:
# skip the next character; it has been escaped!
if not unescape:
current.append(escape)
current.append(next(itr))
except StopIteration:
if unescape:
current.append(escape)
elif ch == delim:
# split! (add current to the list and reset it)
ret.append(''.join(current))
current = []
else:
current.append(ch)
ret.append(''.join(current))
return ret
Run Code Online (Sandbox Code Playgroud)
基于@user629923的建议,但比其他答案简单得多:
import re
DBL_ESC = "!double escape!"
s = r"Hello:World\:Goodbye\\:Cruel\\\:World"
map(lambda x: x.replace(DBL_ESC, r'\\'), re.split(r'(?<!\\):', s.replace(r'\\', DBL_ESC)))
Run Code Online (Sandbox Code Playgroud)
qap*_*hla -4
请注意, : 似乎不是需要转义的字符。
我能想到的完成此操作的最简单方法是拆分字符,然后在转义时将其添加回来。
示例代码(非常需要一些整理。):
def splitNoEscapes(string, char):
sections = string.split(char)
sections = [i + (char if i[-1] == "\\" else "") for i in sections]
result = ["" for i in sections]
j = 0
for s in sections:
result[j] += s
j += (1 if s[-1] != char else 0)
return [i for i in result if i != ""]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
12651 次 |
| 最近记录: |