在Python中分割多行的引号之间提取字符串

Sub*_*Das 5 python regex string

我有一个包含多个条目的文件.每个条目都是以下形式:

"field1","field2","field3","field4","field5"
Run Code Online (Sandbox Code Playgroud)

保证所有字段都不包含任何引号,但它们可以包含,.问题是field4可以分成多行.所以示例文件可能如下所示:

"john","male US","done","Some sample text
across multiple lines. There
can be many lines of this","foo bar baz"
"jane","female UK","done","fields can have , in them","abc xyz"
Run Code Online (Sandbox Code Playgroud)

我想使用Python提取字段.如果该字段不会被分割为多行,则这很简单:从引用之间提取字符串.但我似乎无法在多线字段存在的情况下找到一种简单的方法.

编辑:实际上有五个领域.抱歉,如果有的混乱.该问题已经过编辑以反映这一点.

Bir*_*rei 6

我认为该csv模块可以解决这个问题.它与换行符正确分割:

import csv 

f = open('infile', newline='')
reader = csv.reader(f)
for row in reader:
    for field in row:
        print('-- {}'.format(field))
Run Code Online (Sandbox Code Playgroud)

它产生:

-- john
-- male US
-- done
-- Some sample text
across multiple lines. There
can be many lines of this
-- foo bar baz
-- jane
-- female UK
-- done
-- fields can have , in them
-- abc xyz
Run Code Online (Sandbox Code Playgroud)