这是这个问题的后续和复杂问题:在括号内提取字符串的内容.
在那个问题中,我有以下字符串 -
"Will Farrell (Nick Hasley), Rebecca Hall (Samantha)"
Run Code Online (Sandbox Code Playgroud)
我希望以(actor, character)
- 的形式获得元组列表-
[('Will Farrell', 'Nick Hasley'), ('Rebecca Hall', 'Samantha')]
Run Code Online (Sandbox Code Playgroud)
为了概括问题,我有一个稍微复杂的字符串,我需要提取相同的信息.我的字符串是 -
"Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Glenn Howerton (Gary),
with Stephen Root and Laura Dern (Delilah)"
Run Code Online (Sandbox Code Playgroud)
我需要格式化如下:
[('Will Farrell', 'Nick Hasley'), ('Rebecca Hall', 'Samantha'), ('Glenn Howerton', 'Gary'),
('Stephen Root',''), ('Lauren Dern', 'Delilah')]
Run Code Online (Sandbox Code Playgroud)
我知道我可以替换填充词(with,and,&等),但不能完全弄清楚如何添加空白条目''
- 如果没有扮演者的角色名称(在这种情况下是Stephen)根).这样做最好的方法是什么?
最后,我需要考虑一个actor是否有多个角色,并为actor所拥有的每个角色构建一个元组.我的最后一个字符串是:
"Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Glenn Howerton (Gary, Brad), with
Stephen Root and Laura Dern (Delilah, Stacy)"
Run Code Online (Sandbox Code Playgroud)
我需要构建一个元组列表如下:
[('Will Farrell', 'Nick Hasley'), ('Rebecca Hall', 'Samantha'), ('Glenn Howerton', 'Gary'),
('Glenn Howerton', 'Brad'), ('Stephen Root',''), ('Lauren Dern', 'Delilah'), ('Lauren Dern', 'Stacy')]
Run Code Online (Sandbox Code Playgroud)
谢谢.
import re
credits = """Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Glenn Howerton (Gary, Brad), with
Stephen Root and Laura Dern (Delilah, Stacy)"""
# split on commas (only if outside of parentheses), "with" or "and"
splitre = re.compile(r"\s*(?:,(?![^()]*\))|\bwith\b|\band\b)\s*")
# match the part before the parentheses (1) and what's inside the parens (2)
# (only if parentheses are present)
matchre = re.compile(r"([^(]*)(?:\(([^)]*)\))?")
# split the parts inside the parentheses on commas
splitparts = re.compile(r"\s*,\s*")
characters = splitre.split(credits)
pairs = []
for character in characters:
if character:
match = matchre.match(character)
if match:
actor = match.group(1).strip()
if match.group(2):
parts = splitparts.split(match.group(2))
for part in parts:
pairs.append((actor, part))
else:
pairs.append((actor, ""))
print(pairs)
Run Code Online (Sandbox Code Playgroud)
输出:
[('Will Ferrell', 'Nick Halsey'), ('Rebecca Hall', 'Samantha'),
('Glenn Howerton', 'Gary'), ('Glenn Howerton', 'Brad'), ('Stephen Root', ''),
('Laura Dern', 'Delilah'), ('Laura Dern', 'Stacy')]
Run Code Online (Sandbox Code Playgroud)