msl*_*evi 4 python regex string split
我一直在组织一个能够以我想要的方式处理字符串的函数.我已研究过了一把前面的问题1,2,3等等,我通过整理.这是设置,我有结构良好但可变的数据,需要从从文件读取的字符串拆分为字符串数组.以下展示了我正在处理的数据的一些示例
('Vdfbr76','gsdf','gsfd','',NULL),
('Vkdfb23l','gsfd','gsfg','ggg@df.gf',NULL),
('4asg0124e','Lead Actor/SFX MUA/Prop designer','John Smith','jsmith@email.com',NULL),
('asdguIux','Director, Camera Operator, Editor, VFX','John Smith','',NULL),
...
(492,'E1asegaZ1ox','Nysdag_5YmD','145872325372620',1,'long, string, with, commas'),
Run Code Online (Sandbox Code Playgroud)
我想基于逗号分割这些字符串,但是,字符串中偶尔会包含逗号,这会导致问题.除此之外,re.split(regex, line)
在整个读取过程中,每行中的项目数量变化变得困难.
到目前为止我已经尝试过一些解决方案.
def splitLine(text, fields, delimiter):
return_line = []
regex_string = "(.*?),"
for i in range(0,len(fields)-1):
regex_string+=("(.*)")
if i < len(fields)-2:
regex_string+=delimiter
return_line = re.split(regex_string, text)
return return_line
Run Code Online (Sandbox Code Playgroud)
这将得到一个结果,我们有以下输出
regex_string
return_line
Run Code Online (Sandbox Code Playgroud)
然而,这个问题的主要问题是它偶尔会将两个字段混为一谈.在数组中的第3个值的情况下.
(.*?),(.*),(.*),(.*),(.*),(.*)
['', '\t(222', "'Vy1asdfnuJkA','Ndfbyz3_YMD'", "'14541242640005471'", '2', "'Hello World!')", '', '\n']
Run Code Online (Sandbox Code Playgroud)
理想的结果如下:
['', '\t(222', "'Vy1asdfnuJkA'", "'Ndfbyz3_YMD'", "'14541242640005471'", '2', "'Hello World!')", '', '\n']
Run Code Online (Sandbox Code Playgroud)
这是一个很小的变化,但它对结果有很大的影响.我试着操纵正则表达式字符串以更好地适应我想要做的事情,但是每次我解决了,不幸的是另一个打破了它.
我玩的另一个案例来自用户Aaron Cronin在这篇文章4中,如下所示
def split_at(text, delimiter, opens='<([', closes='>)]', quotes='"\''):
result = []
buff = ""
level = 0
is_quoted = False
for char in text:
if char in delimiter and level == 0 and not is_quoted:
result.append(buff)
buff = ""
else:
buff += char
if char in opens:
level += 1
if char in closes:
level -= 1
if char in quotes:
is_quoted = not is_quoted
if not buff == "":
result.append(buff)
return result
Run Code Online (Sandbox Code Playgroud)
结果如下:
["\t('Vk3NIasef366l','gsdasdf','gsfasfd','',NULL),\n"]
Run Code Online (Sandbox Code Playgroud)
主要问题是它出现的是同一个字符串.这使我进入反馈循环.
理想的结果如下:
[\t('Vk3NIasef366l','gsdasdf','gsfasfd','',NULL),\n]
Run Code Online (Sandbox Code Playgroud)
任何帮助表示赞赏,我不确定在这种情况下最好的方法是什么.我很乐意澄清任何出现的问题.我尽量做到尽可能完整.
from ast import literal_eval
s = """('Vdfbr76','gsdf','gsfd','',NULL),
('Vkdfb23l','gsfd','gsfg','ggg@df.gf',NULL),
('4asg0124e','Lead Actor/SFX MUA/Prop designer','John Smith','jsmith@email.com',NULL),
('asdguIux','Director, Camera Operator, Editor, VFX','John Smith','',NULL),
(492,'E1asegaZ1ox','Nysdag_5YmD','145872325372620',1,'long, string, with, commas'),
"""
for line in s.split("\n"):
line = line.strip().rstrip(",").replace("NULL", "None")
if line:
print list(literal_eval(line)) #list(..) is just an example
Run Code Online (Sandbox Code Playgroud)
输出:
['Vdfbr76', 'gsdf', 'gsfd', '', None]
['Vkdfb23l', 'gsfd', 'gsfg', 'ggg@df.gf', None]
['4asg0124e', 'Lead Actor/SFX MUA/Prop designer', 'John Smith', 'jsmith@email.com', None]
['asdguIux', 'Director, Camera Operator, Editor, VFX', 'John Smith', '', None]
[492, 'E1asegaZ1ox', 'Nysdag_5YmD', '145872325372620', 1, 'long, string, with, commas']
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
648 次 |
最近记录: |