正则表达式 - 在匹配中忽略字符串的某些部分

Question

正则表达式 - 在匹配中忽略字符串的某些部分

这是我的字符串:

address='St Marks Church',notes='The North East\'s premier...'

Run Code Online (Sandbox Code Playgroud)

我用来抓住各个部分的正则表达式match_all是

'/(address|notes)='(.+?)'/i'

Run Code Online (Sandbox Code Playgroud)

结果是:

地址=>圣马克教堂
笔记=>东北\

如何让它忽略笔记的\'字符？

Answer 1

Abs*_*ERØ 5

不确定你是否用heredoc或双引号包装你的字符串,但不是贪婪的方法:

$str4 = 'address="St Marks Church",notes="The North East\'s premier..."';
preg_match_all('~(address|notes)="([^"]*)"~i',$str4,$matches);
print_r($matches);

Run Code Online (Sandbox Code Playgroud)

产量

Array
(
    [0] => Array
        (
            [0] => address="St Marks Church"
            [1] => notes="The North East's premier..."
        )

    [1] => Array
        (
            [0] => address
            [1] => notes
        )

    [2] => Array
        (
            [0] => St Marks Church
            [1] => The North East's premier...
        )

)

Run Code Online (Sandbox Code Playgroud)

另一种使用preg_split的方法:

//split the string at the comma
//assumes no commas in text
$parts = preg_split('!,!', $string);
foreach($parts as $key=>$value){
    //split the values at the = sign
    $parts[$key]=preg_split('!=!',$value);
    foreach($parts[$key] as $k2=>$v2){
        //trim the quotes out and remove the slashes
        $parts[$key][$k2]=stripslashes(trim($v2,"'"));
    }
}

Run Code Online (Sandbox Code Playgroud)

输出如下:

Array
(
    [0] => Array
        (
            [0] => address
            [1] => St Marks Church
        )

    [1] => Array
        (
            [0] => notes
            [1] => The North East's premier...
        )

)

Run Code Online (Sandbox Code Playgroud)

超慢老skool方法:

$len = strlen($string);
$key = "";
$value = "";
$store = array();
$pos = 0;
$mode = 'key';
while($pos < $len){
  switch($string[$pos]){
    case $string[$pos]==='=':
        $mode = 'value';
        break;
    case $string[$pos]===",":
        $store[$key]=trim($value,"'");
        $key=$value='';
        $mode = 'key';
        break;
    default:
        $$mode .= $string[$pos];
  }

  $pos++;
}
        $store[$key]=trim($value,"'");

Run Code Online (Sandbox Code Playgroud)

Answer 2

mic*_*usa 2

因为您已经发布了您正在使用的内容，match_all并且您的个人资料中的顶部标签是php和wordpress，所以我认为可以公平地假设您正在使用preg_match_all()php。

以下模式将匹配构建所需关联数组所需的子字符串：

生成全字符串匹配和 1 个捕获组的模式：

/(address|notes)='\K(?:\\\'|[^'])*/（166 个步骤，演示链接）
/(address|notes)='\K.*?(?=(?<!\\)')/（218 个步骤，演示链接）

生成 2 个捕获组的模式：

/(address|notes)='((?:\\\'|[^'])*)/（168 个步骤，演示链接）
/(address|notes)='(.*?(?<!\\))'/（209 个步骤，演示链接）

代码：（演示）

$string = "address='St Marks Church',notes='The North East\'s premier...'";

preg_match_all(
    "/(address|notes)='\K(?:\\\'|[^'])*/",
    $string,
    $out
);
var_export(array_combine($out[1], $out[0]));

echo "\n---\n";

preg_match_all(
    "/(address|notes)='((?:\\\'|[^'])*)/",
    $string,
    $out,
    PREG_SET_ORDER
);
var_export(array_column($out, 2, 1));

Run Code Online (Sandbox Code Playgroud)

输出：

array (
  'address' => 'St Marks Church',
  'notes' => 'The North East\\\'s premier...',
)
---
array (
  'address' => 'St Marks Church',
  'notes' => 'The North East\\\'s premier...',
)

Run Code Online (Sandbox Code Playgroud)

模式 #1 和 #3 使用替代方案来允许非撇号字符或前面没有反斜杠的撇号。

模式 #2 和 #4（使用 php demo实现时需要额外的反斜杠）使用环视来确保反斜杠前面的撇号不会结束匹配。

一些注意事项：

使用捕获组、替代方案和环视通常会降低模式效率。限制这些组件的使用通常可以提高性能。使用带有贪婪量词的否定字符类通常可以提高性能。
当尝试减少捕获组时，使用\K（重新启动全字符串匹配）非常有用，并且它可以减少输出数组的大小。

归档时间：	12 年，7 月前
查看次数：	29964 次
最近记录：	8 年，2 月前