Adi*_*rma 5 python regex dataframe pandas
我想从我的数据框列中提取年份data3['CopyRight'].
CopyRight
2015 Sony Music Entertainment
2015 Ultra Records , LLC under exclusive license
2014 , 2015 Epic Records , a division of Sony Music Entertainment
Compilation ( P ) 2014 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
Run Code Online (Sandbox Code Playgroud)
我使用以下代码来提取年份:
data3['CopyRight_year'] = data3['CopyRight'].str.extract('([0-9]+)', expand=False).str.strip()
Run Code Online (Sandbox Code Playgroud)
我的代码我只获得了第一次出现的年份.
CopyRight_year
2015
2015
2014
2014
2014
2014
Run Code Online (Sandbox Code Playgroud)
我想提取专栏中提到的所有年份.
预期产出
CopyRight_year
2015
2015
2014,2015
2014
2014,2015
2014,2015
Run Code Online (Sandbox Code Playgroud)
与正则表达式一起使用,findall查找具有4列表长度的所有整数,并按join分隔符结尾:
谢谢@Wiktor Stribi\xc5\xbcew 的想法添加单词边界r\'\\b\\d{4}\\b\':
data3[\'CopyRight_year\'] = data3[\'CopyRight\'].str.findall(r\'\\b\\d{4}\\b\').str.join(\',\')\nprint (data3)\n CopyRight CopyRight_year\n0 2015 Sony Music Entertainment 2015\n1 2015 Ultra Records , LLC under exclusive license 2015\n2 2014 , 2015 Epic Records , a division of Sony ... 2014,2015\n3 Compilation ( P ) 2014 Epic Records , a divisi... 2014\n4 2014 , 2015 Epic Records , a division of Sony ... 2014,2015\n5 2014 , 2015 Epic Records , a division of Sony ... 2014,2015\nRun Code Online (Sandbox Code Playgroud)\n