在python中使用Regex提取日期

Adi*_*rma 5 python regex dataframe pandas

我想从我的数据框列中提取年份data3['CopyRight'].

CopyRight
2015 Sony Music Entertainment
2015 Ultra Records , LLC under exclusive license
2014 , 2015 Epic Records , a division of Sony Music Entertainment
Compilation ( P ) 2014 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
Run Code Online (Sandbox Code Playgroud)

我使用以下代码来提取年份:

data3['CopyRight_year'] = data3['CopyRight'].str.extract('([0-9]+)', expand=False).str.strip()
Run Code Online (Sandbox Code Playgroud)

我的代码我只获得了第一次出现的年份.

CopyRight_year
2015
2015
2014
2014
2014
2014
Run Code Online (Sandbox Code Playgroud)

我想提取专栏中提到的所有年份.

预期产出

CopyRight_year
    2015
    2015
    2014,2015
    2014
    2014,2015
    2014,2015
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 1

与正则表达式一起使用,findall查找具有4列表长度的所有整数,并按join分隔符结尾:

\n\n

谢谢@Wiktor Stribi\xc5\xbcew 的想法添加单词边界r\'\\b\\d{4}\\b\'

\n\n
data3[\'CopyRight_year\'] = data3[\'CopyRight\'].str.findall(r\'\\b\\d{4}\\b\').str.join(\',\')\nprint (data3)\n                                           CopyRight CopyRight_year\n0                      2015 Sony Music Entertainment           2015\n1   2015 Ultra Records , LLC under exclusive license           2015\n2  2014 , 2015 Epic Records , a division of Sony ...      2014,2015\n3  Compilation ( P ) 2014 Epic Records , a divisi...           2014\n4  2014 , 2015 Epic Records , a division of Sony ...      2014,2015\n5  2014 , 2015 Epic Records , a division of Sony ...      2014,2015\n
Run Code Online (Sandbox Code Playgroud)\n