regex php将字符串中的确切单词从不同的组中分离出来

Question

regex php将字符串中的确切单词从不同的组中分离出来

Mou*_*oel 5 php regex cpu-word

我已经尽力了，但是仍然不知道如何解决这个问题：

我有一个字符串ex：

"--included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees"
"--not included-- in selling price: us$ 35.00 express fees 2 % notifying fees"

我想知道税款是“已包含”还是“已排除”，费用是“％”还是“货币”，问题是在附加到税款名称“ vat usd”时未检测到货币“ usd” ”

我如何在不同的组中将货币与出租车名称分开。

这是我所做的

(--excluded--|--included--|--not included--)([a-z ]*)?:?(usd | aed | mad | € | us\$ )?([ \. 0-9 ]*)(%)?([a-z A-z ?]*) (aed|mad|€|us\$)*((aed|mad|€|us\$)+)?([\. 0-9 ]*)(%)?([a-z A-z]*)(.*)?

Run Code Online (Sandbox Code Playgroud)

这就是我得到的

(--excluded--|--included--|--not included--)([a-z ]*)?:?(usd | aed | mad | € | us\$ )?([ \. 0-9 ]*)(%)?([a-z A-z ?]*) (aed|mad|€|us\$)*((aed|mad|€|us\$)+)?([\. 0-9 ]*)(%)?([a-z A-z]*)(.*)?

Run Code Online (Sandbox Code Playgroud)

这就是我想要的

Match 1
Full match  0-83    --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees

Group 1.    0-12    --included--

Group 2.    12-29    in selling price

Group 4.    30-33    5 

Group 5.    33-34   %

Group 6.    34-42    vat usd

Group 10.   43-49   10.00 

Group 12.   49-64   packaging fees 

Group 13.   64-82   2 % notifying fees

Run Code Online (Sandbox Code Playgroud)

Answer 1

Wik*_*żew 2

这是解决方案：

\n\n

$s = "--included-- in product price: breakfast --excluded--: 5 % vat aed 10.00 destination fee per night 2 % municipality fee 3.5 % packaging fee 10 % warranty service charge";\n$results = [];\nif (preg_match_all(\'~(--(?:(?:not )?in|ex)cluded--)(?:\\s+([a-zA-Z ]+))?:+\\s*((?:(?!--(?:(?:not )?in|ex)cluded--).)*)~su\', $s, $m, PREG_SET_ORDER, 0)) {\n    foreach ($m as $v) {\n        $lastline=array_pop($v); // Remove last item //print_r($details);\n        if (preg_match_all(\'~(?:(\\b(?:usd|aed|mad|usd)\\b|\\B\xe2\x82\xac|\\bus\\$)\\s*)?\\d+(?:\\.\\d+)?(?:(?!(?1))\\D)*~ui\', $lastline, $details)) {\n            $results[] = array_merge($v, $details[0]);\n        } else {\n            $results[] = $v;\n        }\n    }\n}\nprint_r($results);\n

Run Code Online (Sandbox Code Playgroud)\n\n

请参阅PHP 演示。

\n\n

注意事项：

\n\n

第一个正则表达式提取您需要解析的每个匹配项。请参阅第一个正则表达式演示。它的意思是：

\n\n

(--(?:(?:not )?in|ex)cluded--)- 第 1 组：的较短版本(--excluded--|--included--|--not included--)：--excluded--,--included--或--not included--
(?:\\s+([a-zA-Z ]+))?- 可选序列：1+ 空格，然后第 2 组：1+ ASCII 字母或空格
:+- 1 个或多个冒号
\\s*- 0+ 空格
((?:(?!--(?:(?:not )?in|ex)cluded--).)*)- 第 3 组：任何字符，出现 0+ 次，尽可能多，不开始三个字符序列中的任何一个：--excluded--, --included--,--not included--

\n\n

然后，需要进一步解析第 3 组值以获取所有详细信息。这里使用第二个正则表达式来匹配

\n\n

(?:(\\b(?:usd|aed|mad|usd)\\b|\\B\xe2\x82\xac|\\bus\\$)\\s*)?- 可选出现\n\n
- (\\b(?:usd|aed|mad|usd)\\b|\\B\xe2\x82\xac|\\bus\\$)- 第 1 组：\n\n
  - \\b(?:usd|aed|mad|usd)\\b- usd, aed,mad或usd作为整个单词
  - \\B\xe2\x82\xac-\xe2\x82\xac前面没有单词 char
  - \\bus\\$-us$前面没有单词 char
- \\s*- 0+ 空格
\\d+
(?:\\.\\d+)?.- 可选的1+ 数字序列
(?:(?!(?1))\\D)*- 任何非数字字符，出现 0 次或多次，尽可能多，不以与第 1 组中相同的模式开始

\n

归档时间：	6 年前
查看次数：	102 次
最近记录：	6 年前