Sub*_*Das 5 python regex string
我有一个包含多个条目的文件.每个条目都是以下形式:
"field1","field2","field3","field4","field5"
Run Code Online (Sandbox Code Playgroud)
保证所有字段都不包含任何引号,但它们可以包含,.问题是field4可以分成多行.所以示例文件可能如下所示:
"john","male US","done","Some sample text
across multiple lines. There
can be many lines of this","foo bar baz"
"jane","female UK","done","fields can have , in them","abc xyz"
Run Code Online (Sandbox Code Playgroud)
我想使用Python提取字段.如果该字段不会被分割为多行,则这很简单:从引用之间提取字符串.但我似乎无法在多线字段存在的情况下找到一种简单的方法.
编辑:实际上有五个领域.抱歉,如果有的混乱.该问题已经过编辑以反映这一点.
我认为该csv模块可以解决这个问题.它与换行符正确分割:
import csv
f = open('infile', newline='')
reader = csv.reader(f)
for row in reader:
for field in row:
print('-- {}'.format(field))
Run Code Online (Sandbox Code Playgroud)
它产生:
-- john
-- male US
-- done
-- Some sample text
across multiple lines. There
can be many lines of this
-- foo bar baz
-- jane
-- female UK
-- done
-- fields can have , in them
-- abc xyz
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1637 次 |
| 最近记录: |