Hom*_*mer 6 python regex substring title
这是一个工作代码,可能不是用先前修改的另一个子字符串替换子字符串的有效代码
输入字符串:
text = ["part1 Pirates (2006)",
"part2 Pirates (2006)"
]
Run Code Online (Sandbox Code Playgroud)
输出字符串:
Pirates PT1 (2006)
Pirates PT2 (2006)
Run Code Online (Sandbox Code Playgroud)
它必须用 'PT' 替换子字符串,如 'part1' 'part2 等等,并将其复制到标题和年份子字符串之间 代码:
#'''''''''''''''''''''''''
# are there parenthesis?
#
def parenth(stringa):
count = 0
for i in stringa:
if i == "(":
count += 1
elif i == ")":
count -= 1
if count < 0:
return False
return count == 0
#'''''''''''''''''''''''''
# extract 'year' from
# the string
#
def getYear(stringa):
if parenth(stringa) is True:
return stringa[stringa.find("(")+1:stringa.find(")")]
#Start
for title in text:
#Does the year exist ? try to Get it ---------> '2006'
yearStr = getYear(title)
#Get integer next to 'part' substring -------> '1'
intPartStr = re.findall(r'part(\d+)', title)
#Delete 'part' Substring --------------------> 'Pirates (2006)
partStr = re.sub(r'part(\d+)',"",title)
#Build a new string -------------------------> "PT1 (2006)"
newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"
#Update title with new String newStr --------> "Pirates PT1 (2006)"
result = re.sub(r'\(([0-9]+)\)',newStr,partStr)
#End
print (result)
Run Code Online (Sandbox Code Playgroud)
但是当名单是这样的
text = ["pt1 Pirates (2006)",
"part 2 Pirates (2006)"
]
Run Code Online (Sandbox Code Playgroud)
我不知道如何提取 'part' 、 'pt' 或 'part 2' 等旁边的整数
编辑:
我以为这个字符串是一样的,但事实并非如此,sry
怎么解决 ?
"part 2 the day sports stood still (2021)"
Run Code Online (Sandbox Code Playgroud)
\w+ 没有抓住所有的话
您可以同时进行所有替换:
import re
text = [
"part1 Pirates (2006)",
"part2 Pirates (2006)",
"pt1 Pirates (2006)",
"part 2 Pirates (2006)",
"part 1 The day sports stood still (2021)"
]
pattern = r'(?:part|pt)\s?(\d+)\s?(\b[\w\s]+\b)\s?\((\d+)\)'
substitute = r'\2 PT\1 (\3)'
for title in text:
title = re.sub(pattern, substitute, title)
# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]
Run Code Online (Sandbox Code Playgroud)
正则表达式解释:
(?:part|pt)\s?(\d+)忽略文本并捕获值(第 1 组)(\b[\w\s]+\b)抓住标题(第 2 组)\((\d+)\)捕获括号中的年份(第 3 组)'\2 PT\1 (\3)' 用组号重新创建你的字符串| 归档时间: |
|
| 查看次数: |
151 次 |
| 最近记录: |