如何从Python的YouTube链接中提取视频ID?

dec*_*rbo 21 python regex parsing url-parsing

我知道这可以使用PHP parse_urlparse_str函数轻松完成:

$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);
Run Code Online (Sandbox Code Playgroud)

但是如何使用Python实现这一目标?我可以做urlparse但接下来呢?

Mik*_*kin 48

我创建了没有正则表达式的youtube id解析器:

def video_id(value):
    """
    Examples:
    - http://youtu.be/SA2iWivDJiE
    - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    - http://www.youtube.com/embed/SA2iWivDJiE
    - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    """
    query = urlparse(value)
    if query.hostname == 'youtu.be':
        return query.path[1:]
    if query.hostname in ('www.youtube.com', 'youtube.com'):
        if query.path == '/watch':
            p = parse_qs(query.query)
            return p['v'][0]
        if query.path[:7] == '/embed/':
            return query.path.split('/')[2]
        if query.path[:3] == '/v/':
            return query.path.split('/')[2]
    # fail?
    return None
Run Code Online (Sandbox Code Playgroud)

  • 这个非常适合解析所有可能的 youtube 链接格式。 (2认同)

rob*_*ert 41

Python有一个用于解析URL的库.

import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]
Run Code Online (Sandbox Code Playgroud)

  • 请注意,Python3 中的 `urlparse` 被移至 `urllib.parse` 类似的方法可以解决这个问题:`import urllib.parse as urlparse` (5认同)

Eli*_*Eli 13

这是 Mikhail Kashkin 解决方案的 Python3 版本,增加了场景。

from urllib.parse import urlparse, parse_qs


# noinspection PyTypeChecker
def extract_video_id(url):
    # Examples:
    # - http://youtu.be/SA2iWivDJiE
    # - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    # - http://www.youtube.com/embed/SA2iWivDJiE
    # - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    query = urlparse(url)
    if query.hostname == 'youtu.be': return query.path[1:]
    if query.hostname in {'www.youtube.com', 'youtube.com'}:
        if query.path == '/watch': return parse_qs(query.query)['v'][0]
        if query.path[:7] == '/watch/': return query.path.split('/')[1]
        if query.path[:7] == '/embed/': return query.path.split('/')[2]
        if query.path[:3] == '/v/': return query.path.split('/')[2]
        # below is optional for playlists
        if query.path[:9] == '/playlist': return parse_qs(query.query)['list'][0]
   # returns None for invalid YouTube url
Run Code Online (Sandbox Code Playgroud)


Ale*_*lex 9

这是RegExp,它涵盖了这些案例 在此输入图像描述

((?<=(v|V)/)|(?<=be/)|(?<=(\?|\&)v=)|(?<=embed/))([\w-]+)


iva*_*aul 6

我用这个很棒的包pytube$ pip install pytube

#Examples
url1='http://youtu.be/SA2iWivDJiE'
url2='http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu'
url3='http://www.youtube.com/embed/SA2iWivDJiE'
url4='http://www.youtube.com/v/SA2iWivDJiE?version=3&amp;hl=en_US'
url5='https://www.youtube.com/watch?v=rTHlyTphWP0&index=6&list=PLjeDyYvG6-40qawYNR4juzvSOg-ezZ2a6'
url6='youtube.com/watch?v=_lOT2p_FCvA'
url7='youtu.be/watch?v=_lOT2p_FCvA'
url8='https://www.youtube.com/watch?time_continue=9&v=n0g-Y0oo5Qs&feature=emb_logo'

urls=[url1,url2,url3,url4,url5,url6,url7,url8]

#Get youtube id
from pytube import extract
for url in urls:
    id=extract.video_id(url)
    print(id)
Run Code Online (Sandbox Code Playgroud)

输出

SA2iWivDJiE
_oPAwA_Udwc
SA2iWivDJiE
SA2iWivDJiE
rTHlyTphWP0
_lOT2p_FCvA
_lOT2p_FCvA
n0g-Y0oo5Qs
Run Code Online (Sandbox Code Playgroud)