Youtube完整的Java Regex

mos*_*aab 11 java regex youtube

我需要解析几个页面才能获得所有的Youtube ID.

我在网上发现了很多正则表达式,但是:Java不完整(它们除了ID之外还给我垃圾,或者他们错过了一些ID).

我发现似乎完整的那个是在这里托管的.但它是用JavaScript和PHP编写的.不幸的是我无法将它们翻译成JAVA.

有人可以帮我在Java中重写这个PHP正则表达式或以下的JavaScript吗?

'~
    https?://         # Required scheme. Either http or https.
    (?:[0-9A-Z-]+\.)? # Optional subdomain.
    (?:               # Group host alternatives.
      youtu\.be/      # Either youtu.be,
    | youtube\.com    # or youtube.com followed by
      \S*             # Allow anything up to VIDEO_ID,
      [^\w\-\s]       # but char before ID is non-ID char.
    )                 # End host alternatives.
    ([\w\-]{11})      # $1: VIDEO_ID is exactly 11 chars.
    (?=[^\w\-]|$)     # Assert next char is non-ID or EOS.
    (?!               # Assert URL is not pre-linked.
      [?=&+%\w]*      # Allow URL (query) remainder.
      (?:             # Group pre-linked alternatives.
        [\'"][^<>]*>  # Either inside a start tag,
      | </a>          # or inside <a> element text contents.
      )               # End recognized pre-linked alts.
    )                 # End negative lookahead assertion.
    [?=&+%\w]*        # Consume any URL (query) remainder.
    ~ix'
Run Code Online (Sandbox Code Playgroud)
/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com\S*[^\w\-\s])([\w\-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:['"][^<>]*>|<\/a>))[?=&+%\w]*/ig;
Run Code Online (Sandbox Code Playgroud)

Mar*_*cus 21

首先,你需要\在旧的正则表达式中插入和额外的反斜杠foreach反斜杠,否则java认为你转义了字符串中的其他特殊字符,你没有这样做.

https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
Run Code Online (Sandbox Code Playgroud)

接下来,在编译模式时,需要添加CASE_INSENSITIVE标志.这是一个例子:

String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";

Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(link);
while(matcher.find()) {
    System.out.println(matcher.group());
}
Run Code Online (Sandbox Code Playgroud)