解析youtube网址

Pau*_*gan 8 ruby url ruby-on-rails

我写了一个ruby youtube url解析器.它被设计为输入以下结构之一的youtube网址(这些是我能找到的youtube url结构,也许还有更多?):

http://youtu.be/sGE4HMvDe-Q
http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1
Run Code Online (Sandbox Code Playgroud)

目的是只保存剪辑或播放列表的ID,以便它可以嵌入,所以如果它是一个剪辑: 'sGE4HMvDe-Q',或者它是一个播放列表: 'p/A0C3C1D163BE880A'

我写的解析器适用于这些网址,但似乎有点脆弱和冗长,我只是想知道是否有人可以建议一个更好的红宝石方法来解决这个问题?

def parse_youtube
    a = url.split('//').last.split('/')
    b = a.last.split('watch?v=').last.split('?').first.split('&').first
    if a[1] == 'p'
        url = "p/#{b}"
    else
        url = b
    end
end
Run Code Online (Sandbox Code Playgroud)

dee*_*see 19

def parse_youtube url
   regex = /(?:.be\/|\/watch\?v=|\/(?=p\/))([\w\/\-]+)/
   url.match(regex)[1]
end

urls = %w[http://youtu.be/sGE4HMvDe-Q 
          http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
          http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1]

urls.each {|url| puts parse_youtube url }
# sGE4HMvDe-Q
# Lp7E973zozc
# p/A0C3C1D163BE880A
Run Code Online (Sandbox Code Playgroud)

根据您使用它的方式,您可能希望更好地验证URL确实来自youtube.

更新:

几年后再回到这里.我一直对最初的答案是多么草率感到恼火.由于Youtube域的有效性无论如何都没有得到验证,我已经删除了一些slop.

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
    be                       'be'
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    watch                    'watch'
--------------------------------------------------------------------------------
    \?                       '?'
--------------------------------------------------------------------------------
    v=                       'v='
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      p                        'p'
--------------------------------------------------------------------------------
      \/                       '/'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [\w\/\-]+                any character of: word characters (a-z,
                             A-Z, 0-9, _), '\/', '\-' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
Run Code Online (Sandbox Code Playgroud)

  • 试试https://github.com/reu/youtube_id gem.它处理许多类型的网址. (2认同)

d11*_*wtq 5

使用Addressable gem,您可以节省一些工作.stdlib中还有一个URI模块,但是Addressable功能更强大.

require 'addressable/uri'

uri = Addressable::URI.parse(youtube_url)
if uri.path == "/watch"
  uri.query_values["v"] if uri.query_values
else
  uri.path
end
Run Code Online (Sandbox Code Playgroud)

编辑| 消除了疯狂.没有注意到Addressable #query_values已经提供.