在 Python 中使用正则表达式从文本中提取列表

Question

在 Python 中使用正则表达式从文本中提取列表

我希望从以下字符串中提取元组列表：

text='''Consumer Price Index:
        +0.2% in Sep 2020

        Unemployment Rate:
        +7.9% in Sep 2020

        Producer Price Index:
        +0.4% in Sep 2020

        Employment Cost Index:
        +0.5% in 2nd Qtr of 2020

        Productivity:
        +10.1% in 2nd Qtr of 2020

        Import Price Index:
        +0.3% in Sep 2020

        Export Price Index:
        +0.6% in Sep 2020'''

Run Code Online (Sandbox Code Playgroud)

我在该过程中使用“import re”。

输出应类似于：[('Consumer Price Index', '+0.2%', 'Sep 2020'), ...]

我想使用 re.findall 函数来生成上述输出，到目前为止我有这个：

re.findall(r"(:\Z)\s+(%\Z+)(\Ain )", text)

Run Code Online (Sandbox Code Playgroud)

我先识别“:”之前的字符，然后识别“%”之前的字符，然后识别“in”之后的字符。

我真的不知道如何继续。任何帮助，将不胜感激。谢谢！

Answer 1

Wik*_*żew 5

您可以使用

re.findall(r'(\S.*):\n\s*(\+?\d[\d.]*%)\s+in\s+(.*)', text)
# => [('Consumer Price Index', '+0.2%', 'Sep 2020'), ('Unemployment Rate', '+7.9%', 'Sep 2020'), ('Producer Price Index', '+0.4%', 'Sep 2020'), ('Employment Cost Index', '+0.5%', '2nd Qtr of 2020'), ('Productivity', '+10.1%', '2nd Qtr of 2020'), ('Import Price Index', '+0.3%', 'Sep 2020'), ('Export Price Index', '+0.6%', 'Sep 2020')]

Run Code Online (Sandbox Code Playgroud)

请参阅正则表达式演示和Python 演示。

细节

(\S.*)- 第 1 组：非空白字符后跟尽可能多的除换行符之外的任何零个或多个字符
:- 一个冒号
\n- 换行符
\s*- 0个或多个空格
(\+?\d[\d.]*%)- 第 2 组：可选+，一个数字，零个或多个数字/点，以及一个%
\s+in\s+-in包含 1 个以上空格
(.*)- 第 3 组：除换行符之外的任何零个或多个字符（尽可能多）

归档时间：	5 年，1 月前
查看次数：	1060 次
最近记录：	4 年，10 月前