通过在@ font-face中搜索替换,从@ font-face中提取网址

Question

通过在@ font-face中搜索替换,从@ font-face中提取网址

我有一个Web服务,它在css文件中重写URL,以便通过CDN提供服务.

css文件可以包含图像或字体的URL.

我目前有以下正则表达式匹配css文件中的所有URL:

(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))

Run Code Online (Sandbox Code Playgroud)

但是,我现在想要引入对自定义字体的支持,并且需要在以下内容中定位网址@font-fontface:

@font-face {
  font-family: 'FontAwesome';
  src: url("fonts/fontawesome-webfont.eot?v=4.0.3");
  src: url("fonts/fontawesome-webfont.eot?#iefix&v=4.0.3") format("embedded-opentype"), url("fonts/fontawesome-webfont.woff?v=4.0.3") format("woff"), url("fonts/fontawesome-webfont.ttf?v=4.0.3") format("truetype"), url("fonts/fontawesome-webfont.svg?v=4.0.3#fontawesomeregular") format("svg");
  font-weight: normal;
  font-style: normal;
}

Run Code Online (Sandbox Code Playgroud)

然后我想出了以下内容:

@font-face\s*\{.*(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))\s*\}

Run Code Online (Sandbox Code Playgroud)

问题是,这匹配所有内容,而不仅仅是内部的网址.我以为我可以这样使用lookbehind:

(?<=@font-face\s*\{.*)(url\(\s*([\'\"]?+))((?!(https?\:|data\:|\.\.\/|\/))\S+)((\2)\s*\))(?<=-\s*\})

Run Code Online (Sandbox Code Playgroud)

不幸的是,PCRE(PHP使用)不支持lookbehind中的变量重复,所以我被卡住了.

我不希望通过扩展名检查字体,因为某些字体的.svg扩展名可能与.svg扩展名的图像冲突.

另外,我还想修改我的原始正则表达式以匹配不在以下内容中的所有其他网址@font-face:

.someclass {
  background: url('images/someimage.png') no-repeat;
}

Run Code Online (Sandbox Code Playgroud)

由于我无法使用lookbehinds,我如何从a中的@font-face那些和不在a中的那些中提取url @font-face？

Answer 1

Ham*_*mZa 12

_{免责声明:您可能不使用图书馆,因为它比您想象的更难.我还想就如何匹配不在@ font-face {}内的URL开始这个答案.我还假设/定义括号{}在@ font-face {}内平衡.

注意:我将使用"〜"作为分隔符而不是"/",这将使我不再在我的表达式中逃避.另请注意,我将从regex101.com发布在线演示,在该网站上我将使用g修饰符.你应该删除g修饰符,然后使用preg_match_all().

让我们用一些正则表达式!}

第1部分:匹配不在@ font-face {}内的网址

1.1匹配@ font-face {}

哦,是的,这可能听起来"很奇怪",但你会在后来注意到为什么:)
我们需要一些递归正则表达式:

@font-face\s*    # Match @font-face and some spaces
(                # Start group 1
   \{            # Match {
   (?:           # A non-capturing group
      [^{}]+     # Match anything except {} one or more times
      |          # Or
      (?1)       # Recurse/rerun the expression of group 1
   )*            # Repeat 0 or more times
   \}            # Match }
)                # End group 1

Run Code Online (Sandbox Code Playgroud)

demo

1.2转义@ font-face {}

我们将(*SKIP)(*FAIL)在之前的正则表达式之后使用,它将跳过它.看到这个答案,了解它是如何工作的.

demo

1.3匹配url()

我们会用这样的东西:

url\s*\(         # Match url, optionally some whitespaces and then (
\s*              # Match optionally some whitespaces
("|'|)           # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
(?!["']?(?:https?://|ftp://))  # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\\]|\\.)*?  # Match anything except a backslash or backslash and a character zero or more times ungreedy
\2               # Match what was matched in group 2
\s*              # Match optionally some whitespaces
\)               # Match )

Run Code Online (Sandbox Code Playgroud)

请注意,我正在使用,\2因为我已将此附加到具有组1的前一个正则表达式.
这是另一个用法("|')(?:[^\\]|\\.)*?\1.

demo

1.4匹配url()内的值

您可能已经猜到我们需要使用一些lookaround-fu,问题在于后视,因为它需要固定长度.我有一个解决方法,我将向您介绍\K转义序列.它会将匹配的开头重置为令牌列表中的当前位置.^更多信息
好吧,让\K我们放在表达式的某处并使用前瞻,我们的最终正则表达式将是:

@font-face\s*    # Match @font-face and some spaces
(                # Start group 1
   \{            # Match {
   (?:           # A non-capturing group
      [^{}]+     # Match anything except {} one or more times
      |          # Or
      (?1)       # Recurse/rerun the expression of group 1
   )*            # Repeat 0 or more times
   \}            # Match }
)                # End group 1
(*SKIP)(*FAIL)   # Skip it
|                # Or
url\s*\(         # Match url, optionally some whitespaces and then (
\s*              # Match optionally some whitespaces
("|'|)           # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
\K               # Reset the match
(?!["']?(?:https?://|ftp://))  # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\\]|\\.)*?  # Match anything except a backslash or backslash and a character zero or more times ungreedy
(?=              # Lookahead
   \2            # Match what was matched in group 2
   \s*           # Match optionally some whitespaces
   \)            # Match )
)

Run Code Online (Sandbox Code Playgroud)

demo

1.5在PHP中使用模式

我们需要转义一些像引号,反斜杠\\\\= \,使用正确的函数和正确的修饰符:

$regex = '~
@font-face\s*    # Match @font-face and some spaces
(                # Start group 1
   \{            # Match {
   (?:           # A non-capturing group
      [^{}]+     # Match anything except {} one or more times
      |          # Or
      (?1)       # Recurse/rerun the expression of group 1
   )*            # Repeat 0 or more times
   \}            # Match }
)                # End group 1
(*SKIP)(*FAIL)   # Skip it
|                # Or
url\s*\(         # Match url, optionally some whitespaces and then (
\s*              # Match optionally some whitespaces
("|\'|)          # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
\K               # Reset the match
(?!["\']?(?:https?://|ftp://))  # Put your negative-rules here (do not match url's with http, https or ftp)
(?:[^\\\\]|\\\\.)*?  # Match anything except a backslash or backslash and a character zero or more times ungreedy
(?=              # Lookahead
   \2            # Match what was matched in group 2
   \s*           # Match optionally some whitespaces
   \)            # Match )
)
~xs';

$input = file_get_contents($css_file);
preg_match_all($regex, $input, $m);
echo '<pre>'. print_r($m[0], true) . '</pre>';

Run Code Online (Sandbox Code Playgroud)

demo

第2部分:匹配@ font-face {}内的url

2.1不同的方法

我想在2个正则表达式中执行此部分,因为在递归正则表达式@font-face {}中处理大括号状态时匹配URL内的URL会很痛苦 {}.

既然我们已经拥有了我们需要的部分,我们只需要在一些代码中应用它们:

匹配所有@font-face {}实例
循环遍历这些并匹配所有url()

2.2将其放入代码中

$results = array(); // Just an empty array;
$fontface_regex = '~
@font-face\s*    # Match @font-face and some spaces
(                # Start group 1
   \{            # Match {
   (?:           # A non-capturing group
      [^{}]+     # Match anything except {} one or more times
      |          # Or
      (?1)       # Recurse/rerun the expression of group 1
   )*            # Repeat 0 or more times
   \}            # Match }
)                # End group 1
~xs';

$url_regex = '~
url\s*\(         # Match url, optionally some whitespaces and then (
\s*              # Match optionally some whitespaces
("|\'|)          # It seems that the quotes are optional according to http://www.w3.org/TR/CSS2/syndata.html#uri
\K               # Reset the match
(?!["\']?(?:https?://|ftp://))  # Put your negative-rules here (do not match url\'s with http, https or ftp)
(?:[^\\\\]|\\\\.)*?  # Match anything except a backslash or backslash and a character zero or more times ungreedy
(?=              # Lookahead
   \1            # Match what was matched in group 2
   \s*           # Match optionally some whitespaces
   \)            # Match )
)
~xs';

$input = file_get_contents($css_file);

preg_match_all($fontface_regex, $input, $fontfaces); // Get all font-face instances
if(isset($fontfaces[0])){ // If there is a match then
    foreach($fontfaces[0] as $fontface){ // Foreach instance
        preg_match_all($url_regex, $fontface, $r); // Let's match the url's
        if(isset($r[0])){ // If there is a hit
            $results[] = $r[0]; // Then add it to the results array
        }
    }
}
echo '<pre>'. print_r($results, true) . '</pre>'; // Show the results

Run Code Online (Sandbox Code Playgroud)

demo

_{_{加入正则表达式聊天室!}}

归档时间：	12 年前
查看次数：	767 次
最近记录：	12 年前