如何在python中替换和插入一个新的子字符串?

Hom*_*mer 6 python regex substring title

这是一个工作代码,可能不是用先前修改的另一个子字符串替换子字符串的有效代码

输入字符串:

text = ["part1 Pirates (2006)",
        "part2 Pirates (2006)"
]
Run Code Online (Sandbox Code Playgroud)

输出字符串:

 Pirates PT1 (2006)

 Pirates PT2 (2006)
Run Code Online (Sandbox Code Playgroud)

它必须用 'PT' 替换子字符串,如 'part1' 'part2 等等,并将其复制到标题和年份子字符串之间 代码:

#'''''''''''''''''''''''''
# are there parenthesis?
# 
def parenth(stringa):
   count = 0
  for i in stringa:
     if i == "(":
        count += 1
     elif i == ")":
        count -= 1
     if count < 0:
        return False
  return count == 0 


#'''''''''''''''''''''''''
# extract 'year' from 
# the string
# 
def getYear(stringa):
     if parenth(stringa) is True:
      return stringa[stringa.find("(")+1:stringa.find(")")]



#Start
for title in text:

  #Does the year exist ? try to Get it ---------> '2006'
  yearStr = getYear(title) 

  #Get integer next to 'part' substring  -------> '1'
  intPartStr = re.findall(r'part(\d+)', title)

  #Delete 'part' Substring  --------------------> 'Pirates (2006)
  partStr = re.sub(r'part(\d+)',"",title)

  #Build a new string  -------------------------> "PT1 (2006)"  
  newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"

  #Update title with new String  newStr --------> "Pirates PT1 (2006)"
  result = re.sub(r'\(([0-9]+)\)',newStr,partStr)

  #End
print (result)
Run Code Online (Sandbox Code Playgroud)

但是当名单是这样的

text = ["pt1 Pirates (2006)",
        "part 2 Pirates (2006)"
]
Run Code Online (Sandbox Code Playgroud)

我不知道如何提取 'part' 、 'pt' 或 'part 2' 等旁边的整数

编辑:

我以为这个字符串是一样的,但事实并非如此,sry

怎么解决 ?

"part 2 the day sports stood still (2021)"
Run Code Online (Sandbox Code Playgroud)

\w+ 没有抓住所有的话

Thi*_* B. 5

您可以同时进行所有替换:

import re

text = [
    "part1 Pirates (2006)",
    "part2 Pirates (2006)",
    "pt1 Pirates (2006)",
    "part 2 Pirates (2006)",
    "part 1 The day sports stood still (2021)"
]

pattern = r'(?:part|pt)\s?(\d+)\s?(\b[\w\s]+\b)\s?\((\d+)\)'
substitute = r'\2 PT\1 (\3)'

for title in text:
    title = re.sub(pattern, substitute, title)

# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]
Run Code Online (Sandbox Code Playgroud)

正则表达式解释:

  • (?:part|pt)\s?(\d+)忽略文本并捕获值(第 1 组
  • (\b[\w\s]+\b)抓住标题(第 2 组
  • \((\d+)\)捕获括号中的年份(第 3 组
  • '\2 PT\1 (\3)' 用组号重新创建你的字符串