正则表达式匹配文件中的特定函数及其参数

gui*_*ier 1 regex parsing pcre

我正在研究gettext javascript解析器,而且我仍然坚持使用解析正则表达式.

我需要赶上传递给特定的方法调用每个参数_n(_(.例如,如果我在我的javascript文件中有这些:

_("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls.. 
Run Code Online (Sandbox Code Playgroud)

这引用了这个文档:http://poedit.net/trac/wiki/Doc/Keywords

我打算两次(和两个正则表达式)这样做:

  1. 捕获所有函数参数_n(_(方法调用
  2. 只抓住那些粗壮的

基本上,我想要一个正则表达式"可以说" 在函数完成后捕获所有内容_n(或者在最后一个括号中_(停止.我不知道是否可以使用正则表达式并且没有javascript解析器.)

还可以做的是"捕获每个"字符串"或'字符串'之后_n(或者_(在行的末尾或者在新的_n(_(字符的开头处停止.

在我完成的所有事情中,我要么坚持_( "one (optional)" );使用其内部括号,要么apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples)在同一行上进行两次调用.

这是我到目前为止实现的,具有不完美的正则表达式:通用解析器javascript one或者把手一个

Ham*_*mZa 8

注意: 如果您不熟悉递归,阅读此答案.

第1部分:匹配特定功能

谁说正则表达式不能模块化?好PCRE正则表达救援!

~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
_n?                               # Match _ or _n
\s*                               # Optional white spaces
(?P<results>(?&brackets))         # Recurse/use the brackets pattern and put it in the results group
~sx
Run Code Online (Sandbox Code Playgroud)

s是与匹配换行.x修改是这个奇特间距和我们的正则表达式的评论.

Online regex demo Online php demo

第2部分:摆脱开始和结束括号

由于我们的正则表达式也将获得开始和结束括号(),我们可能需要过滤它们.我们将preg_replace()在结果上使用:

~           # Delimiter
^           # Assert begin of string
\(          # Match an opening bracket
\s*         # Match optional whitespaces
|           # Or
\s*         # Match optional whitespaces
\)          # Match a closing bracket
$           # Assert end of string
~x
Run Code Online (Sandbox Code Playgroud)

Online php demo

第3部分:提取参数

所以这是另一个模块化正则表达式,你甚至可以添加自己的语法:

~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<array>
      Array\s*
      (?&brackets)
   )

   (?P<variable>
      [^\s,()]+        # I don't know the exact grammar for a variable in ECMAScript
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            (?&array)             # Recurse/use the array pattern
            |                     # Or
            (?&variable)          # Recurse/use the array pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
(?&array)
|
(?&variable)
|
(?&str_double_quotes)
|
(?&str_single_quotes)
~xis
Run Code Online (Sandbox Code Playgroud)

我们将循环使用preg_match_all().最终的代码如下所示:

$functionPattern = <<<'regex'
~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
_n?                               # Match _ or _n
\s*                               # Optional white spaces
(?P<results>(?&brackets))         # Recurse/use the brackets pattern and put it in the results group
~sx
regex;


$argumentsPattern = <<<'regex'
~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<array>
      Array\s*
      (?&brackets)
   )

   (?P<variable>
      [^\s,()]+        # I don't know the exact grammar for a variable in ECMAScript
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            (?&array)             # Recurse/use the array pattern
            |                     # Or
            (?&variable)          # Recurse/use the array pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
(?&array)
|
(?&str_double_quotes)
|
(?&str_single_quotes)
|
(?&variable)
~six
regex;

$input = <<<'input'
_  ("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls..

// misleading cases
_n("foo (")
_n("foo (\)", 'foo)', aa)
_n( Array(1, 2, 3), Array(")",   '(')   );
_n(function(foo){return foo*2;}); // Is this even valid?
_n   ();   // Empty
_ (   
    "Foo",
    'Bar',
    Array(
        "wow",
        "much",
        'whitespaces'
    ),
    multiline
); // PCRE is awesome
input;

if(preg_match_all($functionPattern, $input, $m)){
    $filtered = preg_replace(
        '~          # Delimiter
        ^           # Assert begin of string
        \(          # Match an opening bracket
        \s*         # Match optional whitespaces
        |           # Or
        \s*         # Match optional whitespaces
        \)          # Match a closing bracket
        $           # Assert end of string
        ~x', // Regex
        '', // Replace with nothing
        $m['results'] // Subject
    ); // Getting rid of opening & closing brackets

    // Part 3: extract arguments:
    $parsedTree = array();
    foreach($filtered as $arguments){   // Loop
        if(preg_match_all($argumentsPattern, $arguments, $m)){ // If there's a match
            $parsedTree[] = array(
                'all_arguments' => $arguments,
                'branches' => $m[0]
            ); // Add an array to our tree and fill it
        }else{
            $parsedTree[] = array(
                'all_arguments' => $arguments,
                'branches' => array()
            ); // Add an array with empty branches
        }
    }

    print_r($parsedTree); // Let's see the results;
}else{
    echo 'no matches';
}
Run Code Online (Sandbox Code Playgroud)

Online php demo

您可能希望创建一个递归函数来生成完整的树.看到这个答案.

您可能会注意到function(){}部件未正确解析.我会把它作为读者的练习:)

  • 感谢您的时间和出色的答案!最好 (2认同)