熊猫ValueError：模式不包含捕获组

Question

熊猫ValueError：模式不包含捕获组

使用正则表达式时，我得到：

import re
string = r'http://www.example.com/abc.html'
result = re.search('^.*com', string).group()

Run Code Online (Sandbox Code Playgroud)

在大熊猫中，我写道：

df = pd.DataFrame(columns = ['index', 'url'])
df.loc[len(df), :] = [1, 'http://www.example.com/abc.html']
df.loc[len(df), :] = [2, 'http://www.hello.com/def.html']
df.str.extract('^.*com')

ValueError: pattern contains no capture groups

Run Code Online (Sandbox Code Playgroud)

如何解决问题？

谢谢。

Answer 1

cs9*_*s95 7

根据文档，您需要为要提取的对象指定一个捕获组（即括号）str.extract。

Series.str.extract(pat, flags=0, expand=True)
对于系列中的每个主题字符串，从正则表达式pat的第一个匹配项中提取组。

每个捕获组在输出中构成其自己的列。

df.url.str.extract(r'(.*.com)')

                        0
0  http://www.example.com
1    http://www.hello.com

Run Code Online (Sandbox Code Playgroud)

# If you need named capture groups,
df.url.str.extract(r'(?P<URL>.*.com)')

                      URL
0  http://www.example.com
1    http://www.hello.com

Run Code Online (Sandbox Code Playgroud)

或者，如果您需要系列，

df.url.str.extract(r'(.*.com)', expand=False)

0    http://www.example.com
1      http://www.hello.com
Name: url, dtype: object

Run Code Online (Sandbox Code Playgroud)

Answer 2

jez*_*ael 6

你需要指定柱url与()用于匹配组：

df['new'] = df['url'].str.extract(r'(^.*com)')
print (df)
  index                              url                     new
0     1  http://www.example.com/abc.html  http://www.example.com
1     2    http://www.hello.com/def.html    http://www.hello.com

Run Code Online (Sandbox Code Playgroud)

Answer 3

ank*_*_91 5

试试这个 python 库，它非常适合此目的：

使用urllib.parse

from urllib.parse import urlparse
df['domain']=df.url.apply(lambda x:urlparse(x).netloc)
print(df)

  index                              url           domain
0     1  http://www.example.com/abc.html  www.example.com
1     2    http://www.hello.com/def.html    www.hello.com

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年前
查看次数：	4076 次
最近记录：	7 年前