如何在Python中拆分逗号分隔的字符串,除了引号内的逗号

Question

如何在Python中拆分逗号分隔的字符串,除了引号内的逗号

我试图在python中拆分逗号分隔的字符串.对我来说,棘手的部分是数据中的一些字段本身有一个逗号,它们用引号("或')括起来.生成的拆分字符串也应该删除字段周围的引号.此外,某些字段可能为空.

例:

hey,hello,,"hello,world",'hey,world'

Run Code Online (Sandbox Code Playgroud)

需要分成5个部分,如下所示

['hey', 'hello', '', 'hello,world', 'hey,world']

Run Code Online (Sandbox Code Playgroud)

任何有关如何在Python中解决上述问题的想法/想法/建议/帮助将非常感激.

谢谢,Vish

Answer 1

Spl*_*iFF 9

听起来你想要CSV模块.

Answer 2

Gle*_*ard 5

（编辑：由于工作方式，原始答案在边缘有空字段时遇到问题re.findall，因此我对其进行了一些重构并添加了测试。）

import re

def parse_fields(text):
    r"""
    >>> list(parse_fields('hey,hello,,"hello,world",\'hey,world\''))
    ['hey', 'hello', '', 'hello,world', 'hey,world']
    >>> list(parse_fields('hey,hello,,"hello,world",\'hey,world\','))
    ['hey', 'hello', '', 'hello,world', 'hey,world', '']
    >>> list(parse_fields(',hey,hello,,"hello,world",\'hey,world\','))
    ['', 'hey', 'hello', '', 'hello,world', 'hey,world', '']
    >>> list(parse_fields(''))
    ['']
    >>> list(parse_fields(','))
    ['', '']
    >>> list(parse_fields('testing,quotes not at "the" beginning \'of\' the,string'))
    ['testing', 'quotes not at "the" beginning \'of\' the', 'string']
    >>> list(parse_fields('testing,"unterminated quotes'))
    ['testing', '"unterminated quotes']
    """
    pos = 0
    exp = re.compile(r"""(['"]?)(.*?)\1(,|$)""")
    while True:
        m = exp.search(text, pos)
        result = m.group(2)
        separator = m.group(3)

        yield result

        if not separator:
            break

        pos = m.end(0)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Run Code Online (Sandbox Code Playgroud)

(['"]?) 匹配可选的单引号或双引号。

(.*?)匹配字符串本身。这是一个非贪婪的匹配，在不吃掉整个字符串的情况下尽可能多地匹配。这被分配给result，这就是我们实际产生的结果。

\1 是反向引用，以匹配我们之前匹配的相同单引号或双引号（如果有）。

(,|$)匹配分隔每个条目或行尾的逗号。这被分配给separator.

如果分隔符为假（例如空），则意味着没有分隔符，所以我们在字符串的末尾——我们完成了。否则，我们根据正则表达式完成的位置更新新的开始位置 ( m.end(0))，并继续循环。

归档时间：	15 年前
查看次数：	4555 次
最近记录：	13 年，5 月前