Python拆分字符串而不拆分转义字符

Dan*_*uen 20 python-2.7

有没有办法分割字符串而不拆分转义字符?例如,我有一个字符串,想要用':'而不是'\:'来拆分

http\://www.example.url:ftp\://www.example.url
Run Code Online (Sandbox Code Playgroud)

结果应如下:

['http\://www.example.url' , 'ftp\://www.example.url']
Run Code Online (Sandbox Code Playgroud)

小智 27

使用带有负向lookbehind断言的正则表达式有一种更简单的方法:

re.split(r'(?<!\\):', str)
Run Code Online (Sandbox Code Playgroud)

  • 如何逃脱逃脱角色?不适用于'Hello \\:world` (7认同)
  • @abcdaa 这确实是最好的,但有时您会收到来自您无法控制的第三方的文件。 (2认同)

Hen*_*ter 7

正如伊格纳西奥说的那样,是的,但并非一蹴而就.问题是您需要回顾以确定您是否处于转义分隔符,并且基本string.split不提供该功能.

如果这不在紧密循环中,那么性能不是一个重要问题,您可以先通过拆分转义分隔符,然后执行拆分,然后合并来完成.丑陋的演示代码如下:

# Bear in mind this is not rigorously tested!
def escaped_split(s, delim):
    # split by escaped, then by not-escaped
    escaped_delim = '\\'+delim
    sections = [p.split(delim) for p in s.split(escaped_delim)] 
    ret = []
    prev = None
    for parts in sections: # for each list of "real" splits
        if prev is None:
            if len(parts) > 1:
                # Add first item, unless it's also the last in its section
                ret.append(parts[0])
        else:
            # Add the previous last item joined to the first item
            ret.append(escaped_delim.join([prev, parts[0]]))
        for part in parts[1:-1]:
            # Add all the items in the middle
            ret.append(part)
        prev = parts[-1]
    return ret

s = r'http\://www.example.url:ftp\://www.example.url'
print (escaped_split(s, ':')) 
# >>> ['http\\://www.example.url', 'ftp\\://www.example.url']
Run Code Online (Sandbox Code Playgroud)

或者,如果您只是手动拆分字符串,则可能更容易遵循逻辑.

def escaped_split(s, delim):
    ret = []
    current = []
    itr = iter(s)
    for ch in itr:
        if ch == '\\':
            try:
                # skip the next character; it has been escaped!
                current.append('\\')
                current.append(next(itr))
            except StopIteration:
                pass
        elif ch == delim:
            # split! (add current to the list and reset it)
            ret.append(''.join(current))
            current = []
        else:
            current.append(ch)
    ret.append(''.join(current))
    return ret
Run Code Online (Sandbox Code Playgroud)

请注意,当遇到双转义后跟分隔符时,第二个版本的行为略有不同:此函数允许转义转义字符,因此escaped_split(r'a\\:b', ':')返回['a\\\\', 'b'],因为第一个\转义第二个版本,使得:被解释为真正的分隔符.所以这是值得注意的.


Tah*_*gir 5

与 Python3 兼容的 Henry 答案的编辑版本,测试并修复了一些问题:

def split_unescape(s, delim, escape='\\', unescape=True):
    """
    >>> split_unescape('foo,bar', ',')
    ['foo', 'bar']
    >>> split_unescape('foo$,bar', ',', '$')
    ['foo,bar']
    >>> split_unescape('foo$$,bar', ',', '$', unescape=True)
    ['foo$', 'bar']
    >>> split_unescape('foo$$,bar', ',', '$', unescape=False)
    ['foo$$', 'bar']
    >>> split_unescape('foo$', ',', '$', unescape=True)
    ['foo$']
    """
    ret = []
    current = []
    itr = iter(s)
    for ch in itr:
        if ch == escape:
            try:
                # skip the next character; it has been escaped!
                if not unescape:
                    current.append(escape)
                current.append(next(itr))
            except StopIteration:
                if unescape:
                    current.append(escape)
        elif ch == delim:
            # split! (add current to the list and reset it)
            ret.append(''.join(current))
            current = []
        else:
            current.append(ch)
    ret.append(''.join(current))
    return ret
Run Code Online (Sandbox Code Playgroud)


cas*_*dcl 5

基于@user629923的建议,但比其他答案简单得多:

import re
DBL_ESC = "!double escape!"

s = r"Hello:World\:Goodbye\\:Cruel\\\:World"

map(lambda x: x.replace(DBL_ESC, r'\\'), re.split(r'(?<!\\):', s.replace(r'\\', DBL_ESC)))
Run Code Online (Sandbox Code Playgroud)


qap*_*hla -4

请注意, : 似乎不是需要转义的字符。

我能想到的完成此操作的最简单方法是拆分字符,然后在转义时将其添加回来。

示例代码(非常需要一些整理。):

def splitNoEscapes(string, char):
    sections = string.split(char)
    sections = [i + (char if i[-1] == "\\" else "") for i in sections]
    result = ["" for i in sections]
    j = 0
    for s in sections:
        result[j] += s
        j += (1 if s[-1] != char else 0)
    return [i for i in result if i != ""]
Run Code Online (Sandbox Code Playgroud)

  • 不要被这个答案所愚弄,因为即使它是被接受的答案也是正确的,请参阅http://stackoverflow.com/a/18092547/99834,它似乎可以处理转义的冒号。 (3认同)