在python中使用Regex提取日期

Question

在python中使用Regex提取日期

Adi*_*rma 5 python regex dataframe pandas

我想从我的数据框列中提取年份data3['CopyRight'].

CopyRight
2015 Sony Music Entertainment
2015 Ultra Records , LLC under exclusive license
2014 , 2015 Epic Records , a division of Sony Music Entertainment
Compilation ( P ) 2014 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment

Run Code Online (Sandbox Code Playgroud)

我使用以下代码来提取年份:

data3['CopyRight_year'] = data3['CopyRight'].str.extract('([0-9]+)', expand=False).str.strip()

Run Code Online (Sandbox Code Playgroud)

我的代码我只获得了第一次出现的年份.

CopyRight_year
2015
2015
2014
2014
2014
2014

Run Code Online (Sandbox Code Playgroud)

我想提取专栏中提到的所有年份.

预期产出

CopyRight_year
    2015
    2015
    2014,2015
    2014
    2014,2015
    2014,2015

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 1

与正则表达式一起使用，findall查找具有4列表长度的所有整数，并按join分隔符结尾：

\n\n

谢谢@Wiktor Stribi\xc5\xbcew 的想法添加单词边界r\'\\b\\d{4}\\b\'：

\n\n

data3[\'CopyRight_year\'] = data3[\'CopyRight\'].str.findall(r\'\\b\\d{4}\\b\').str.join(\',\')\nprint (data3)\n                                           CopyRight CopyRight_year\n0                      2015 Sony Music Entertainment           2015\n1   2015 Ultra Records , LLC under exclusive license           2015\n2  2014 , 2015 Epic Records , a division of Sony ...      2014,2015\n3  Compilation ( P ) 2014 Epic Records , a divisi...           2014\n4  2014 , 2015 Epic Records , a division of Sony ...      2014,2015\n5  2014 , 2015 Epic Records , a division of Sony ...      2014,2015\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	7 年，2 月前
查看次数：	80 次
最近记录：	7 年，2 月前