相关疑难解决方法(0)

1653
推荐指数
16
解决办法
66万
查看次数

Python re.findall表现得很奇怪

源字符串是:

# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
Run Code Online (Sandbox Code Playgroud)

这是我的模式:

pattern = r'-?[0-9]+(\\.[0-9]*)?|-?\\.[0-9]+'
Run Code Online (Sandbox Code Playgroud)

但是,re.search可以给我正确的结果:

m = re.search(pattern, s)
print(m)  # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
Run Code Online (Sandbox Code Playgroud)

re.findall 只是转出一个空列表:

L = re.findall(pattern, s)
print(L)  # output: ['', '', '']
Run Code Online (Sandbox Code Playgroud)

为什么不能re.findall给我预期的清单:

['123', '3.1415926']
Run Code Online (Sandbox Code Playgroud)

python regex

18
推荐指数
2
解决办法
2670
查看次数

Python:UserWarning:此模式具有匹配组.要实际获取组,请使用str.extract

我有一个数据帧,我尝试获取字符串,其中列包含一些字符串Df看起来像

member_id,event_path,event_time,event_duration
30595,"2016-03-30 12:27:33",yandex.ru/,1
30595,"2016-03-30 12:31:42",yandex.ru/,0
30595,"2016-03-30 12:31:43",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:44",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:45",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:46",yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,0
30595,"2016-03-30 12:31:49",kinogo.co/,1
30595,"2016-03-30 12:32:11",kinogo.co/melodramy/,0
Run Code Online (Sandbox Code Playgroud)

和另一个df与网址

url
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_bq_phoenix
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnyj_telefon_fly_
003\.ru\/sonyxperia
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony
003\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/mobilnye_telefony_smartfony\/brands5D5Bbr_23
1click\.ru\/sonyxperia
1click\.ru\/[a-zA-Z0-9-_%$#?.:+=|()]+\/chasy-motorola
Run Code Online (Sandbox Code Playgroud)

我用

urls = pd.read_csv('relevant_url1.csv', error_bad_lines=False)
substr = urls.url.values.tolist()
data = pd.read_csv('data_nts2.csv', error_bad_lines=False, chunksize=50000)
result = pd.DataFrame()
for i, df in enumerate(data):
    res = df[df['event_time'].str.contains('|'.join(substr), regex=True)]
Run Code Online (Sandbox Code Playgroud)

但它回报了我

UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
Run Code Online (Sandbox Code Playgroud)

我该如何解决这个问题?

python regex pandas

12
推荐指数
5
解决办法
1万
查看次数

标签 统计

regex ×3

python ×2

capturing-group ×1

pandas ×1

regex-group ×1