dec*_*rbo 21 python regex parsing url-parsing
我知道这可以使用PHP parse_url和parse_str函数轻松完成:
$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);
Run Code Online (Sandbox Code Playgroud)
但是如何使用Python实现这一目标?我可以做urlparse但接下来呢?
Mik*_*kin 48
我创建了没有正则表达式的youtube id解析器:
def video_id(value):
"""
Examples:
- http://youtu.be/SA2iWivDJiE
- http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
- http://www.youtube.com/embed/SA2iWivDJiE
- http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
"""
query = urlparse(value)
if query.hostname == 'youtu.be':
return query.path[1:]
if query.hostname in ('www.youtube.com', 'youtube.com'):
if query.path == '/watch':
p = parse_qs(query.query)
return p['v'][0]
if query.path[:7] == '/embed/':
return query.path.split('/')[2]
if query.path[:3] == '/v/':
return query.path.split('/')[2]
# fail?
return None
Run Code Online (Sandbox Code Playgroud)
rob*_*ert 41
Python有一个用于解析URL的库.
import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]
Run Code Online (Sandbox Code Playgroud)
Eli*_*Eli 13
这是 Mikhail Kashkin 解决方案的 Python3 版本,增加了场景。
from urllib.parse import urlparse, parse_qs
# noinspection PyTypeChecker
def extract_video_id(url):
# Examples:
# - http://youtu.be/SA2iWivDJiE
# - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
# - http://www.youtube.com/embed/SA2iWivDJiE
# - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
query = urlparse(url)
if query.hostname == 'youtu.be': return query.path[1:]
if query.hostname in {'www.youtube.com', 'youtube.com'}:
if query.path == '/watch': return parse_qs(query.query)['v'][0]
if query.path[:7] == '/watch/': return query.path.split('/')[1]
if query.path[:7] == '/embed/': return query.path.split('/')[2]
if query.path[:3] == '/v/': return query.path.split('/')[2]
# below is optional for playlists
if query.path[:9] == '/playlist': return parse_qs(query.query)['list'][0]
# returns None for invalid YouTube url
Run Code Online (Sandbox Code Playgroud)
我用这个很棒的包pytube。$ pip install pytube
#Examples
url1='http://youtu.be/SA2iWivDJiE'
url2='http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu'
url3='http://www.youtube.com/embed/SA2iWivDJiE'
url4='http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US'
url5='https://www.youtube.com/watch?v=rTHlyTphWP0&index=6&list=PLjeDyYvG6-40qawYNR4juzvSOg-ezZ2a6'
url6='youtube.com/watch?v=_lOT2p_FCvA'
url7='youtu.be/watch?v=_lOT2p_FCvA'
url8='https://www.youtube.com/watch?time_continue=9&v=n0g-Y0oo5Qs&feature=emb_logo'
urls=[url1,url2,url3,url4,url5,url6,url7,url8]
#Get youtube id
from pytube import extract
for url in urls:
id=extract.video_id(url)
print(id)
Run Code Online (Sandbox Code Playgroud)
输出
SA2iWivDJiE
_oPAwA_Udwc
SA2iWivDJiE
SA2iWivDJiE
rTHlyTphWP0
_lOT2p_FCvA
_lOT2p_FCvA
n0g-Y0oo5Qs
Run Code Online (Sandbox Code Playgroud)