python正则表达式删除匹配括号文件

tho*_*rmi 3 python regex

我有一个Latex文件,其中标记了很多文本\red{},但是里面也可能有括号\red{},比如\red{here is \underline{underlined} text}.我想删除红色,经过一些谷歌搜索,我写了这个python脚本:

import os, re, sys
#Start program in terminal with
#python RedRemover.py filename
#sys.argv[1] then has the value filename
ifn = sys.argv[1]
#Open file and read it
f = open(ifn, "r")
c = f.read() 
#The whole file content is now stored in the string c
#Remove occurences of \red{...} in c
c=re.sub(r'\\red\{(?:[^\}|]*\|)?([^\}|]*)\}', r'\1', c)
#Write c into new file
Nf=open("RedRemoved_"+ifn,"w")
Nf.write(c)

f.close()
Nf.close()
Run Code Online (Sandbox Code Playgroud)

但这将转换

\ red {here is\underline {underlined} text}

这里是\ underline {underlined text}

这不是我想要的.我想要

这里是\ underline {underlined}文本

Cas*_*yte 6

您不能将未确定级别的嵌套括号与re模块匹配,因为它不支持递归.要解决此问题,您可以使用新的正则表达式模块:

import regex

c = r'\red{here is \underline{underlined} text}'

c = regex.sub(r'\\red({((?>[^{}]+|(?1))*)})', r'\2', c)
Run Code Online (Sandbox Code Playgroud)

哪里(?1)是对捕获组1的递归调用.