php preg_replace 为每个捕获组分配不同的替换模式

Ger*_*ols 3 php regex sql-injection preg-replace capturing-group

我正在尝试以布尔模式执行 mysql 全文搜索,并且需要在构建 mysql 查询之前准备搜索文本。

为了实现这一目标,我认为我可以使用 PHP 函数preg_replace并用一种特定模式替换每个捕获组。

  1. 第一个模式必须找到引号 ( "hello world") 之间的单词或句子并在前面添加+( +"hello world")。
  2. 第二个模式必须找到其余单词(不带引号)并添加+前后*( +how*)。

正则表达式模式

["']+([^"']+)["']+|([^\s"']+)
Run Code Online (Sandbox Code Playgroud)

替代模式

+"\1" +\2*
Run Code Online (Sandbox Code Playgroud)

例子

对于以下输入:

"hello world" how are you?
Run Code Online (Sandbox Code Playgroud)

它应该返回:

+"hello world" +how* +are* +you?*
Run Code Online (Sandbox Code Playgroud)

但相反,它返回一些“错误”的东西

+"hello world" +* +"" +how* +"" +are* +"" +you?*
Run Code Online (Sandbox Code Playgroud)

我知道替换模式+"\1" +\2*永远不会起作用,因为我没有告诉任何地方+"..."应该只适用于第一个捕获组和+...*第二个捕获组。

测试在线正则表达式

PHP代码

$query = preg_replace('~["\']+([^"\']+)["\']+|([^\s"\']+)~', '+"\1" +\2*', $query);
Run Code Online (Sandbox Code Playgroud)

有没有办法在 PHP 中实现这一点?先感谢您。


编辑/解决方案

感谢@revo建议使用 PHP 函数preg_replace_callback,我成功地使用扩展函数为每个搜索模式分配了一个替换模式preg_replace_callback_array。请注意,此功能需要 PHP >= 7。

FULLTEXT在这里,我发布了我用来通过执行搜索的函数的最终版本MATCH (...) AGAINST (...) IN BOOLEAN MODEclass dbReader该函数在 WordPress 插件中声明。也许它对某人有用。

// Return maximum 100 ids of products matching $query in
// name or description searching for each word using MATCH AGAINST in BOOLEAN MODE
public function search_products($query) {

    function replace_callback($m, $f) {
        return sprintf($f, isset($m[1]) ? $m[1] : $m[0]);
    }

    // Replace simple quotes by double quotes in strings between quotes:
    // iPhone '8 GB' => iPhone "8 GB"
    // Apple's iPhone 8 '32 GB' => Apple's iPhone 8 "32 GB"
    // This is necessary later when the matches are devided in two groups:
    //      1. Strings not between double quotes
    //      2. Strings between double quotes
    $query = preg_replace("~(\s*)'+([^']+)'+(\s*)~", '$1"$2"$3', $query);

    // Do some magic to take numbers with their units as one word
    // iPhone 8 64 GB => iPhone 8 "64 GB"
    $pattern = array(
        '(\b[.,0-9]+)\s*(gb\b)',
        '(\b[.,0-9]+)\s*(mb\b)',
        '(\b[.,0-9]+)\s*(mm\b)',
        '(\b[.,0-9]+)\s*(mhz\b)',
        '(\b[.,0-9]+)\s*(ghz\b)'
    );
    array_walk($pattern, function(&$value) {
        // Surround with double quotes only if the user isn't doing manual grouping
        $value = '~'.$value.'(?=(?:[^"]*"[^"]*")*[^"]*\Z)~i';
    });
    $query = preg_replace($pattern, '"$1 $2"', $query);

    // Prepare query string for a "match against" in "boolean mode"
    $patterns = array(
        // 1. All strings not sorrounded by double quotes
        '~([^\s"]+)(?=(?:[^"]*"[^"]*")*[^"]*\Z)~'   => function($m){
            return replace_callback($m, '+%s*');
        },

        // 2. All strings between double quotes
        '~"+([^"]+)"+~'                             => function($m){
            return replace_callback($m, '+"%s"');
        }
    );

    // Replace every single word by a boolean expression: +some* +word*
    // Respect quoted strings: +"iPhone 8"
    // preg_replace_callback_array needs PHP Version >= 7
    $query = preg_replace_callback_array($patterns, $query);

    $fulltext_fields = array(
        'title'         => array(
            'importance'    => 1.5,
            'table'         => 'p',
            'fields'        => array(
                'field1',
                'field2',
                'field3',
                'field4'
            )
        ),
        'description'   => array(
            'importance'    => 1,
            'table'         => 'p',
            'fields'        => array(
                'field5',
                'field6',
                'field7',
                'field8'
            )
        )
    );
    $select_match = $match_full = $priority_order = "";

    $args = array();
    foreach ($fulltext_fields as $index => $obj) {
        $match          = $obj['table'].".".implode(", ".$obj['table'].".", $obj['fields']);
        $select_match  .= ", MATCH ($match) AGAINST (%s IN BOOLEAN MODE) AS {$index}_score";
        $match_full    .= ($match_full!=""?", ":"").$match;
        $priority_order.= ($priority_order==""?"ORDER BY ":" + ")."({$index}_score * {$obj['importance']})";
        array_push($args, $query);
    }
    $priority_order .= $priority_order!=""?" DESC":"";

    // User input $query is passed as %s parameter to db->prepare() in order to avoid SQL injection
    array_push($args, $query, $this->model_name, $this->view_name);

    return $this->db->get_col(
        $this->db->prepare(
            "SELECT p.__pk $select_match
            FROM ankauf_... AND
                    MATCH ($match_full) AGAINST (%s IN BOOLEAN MODE)
                INNER JOIN ...
            WHERE
                m.bezeichnung=%s AND
                a.bezeichnung=%s
                $priority_order
            LIMIT 100
            ;",
            $args
        )
    );
}
Run Code Online (Sandbox Code Playgroud)

rev*_*evo 5

你必须使用preg_replace_callback

$str = '"hello world" how are you?';

echo preg_replace_callback('~("[^"]+")|\S+~', function($m) {
    return isset($m[1]) ? "+" . $m[1] : "+" . $m[0] . "*";
}, $str);
Run Code Online (Sandbox Code Playgroud)

输出:

+"hello world" +how* +are* +you?*
Run Code Online (Sandbox Code Playgroud)

现场演示