Google表格公式中的多个正则表达式匹配

Question

Google表格公式中的多个正则表达式匹配

我正在尝试A1使用Google表格正则表达式公式获取给定字符串中连字符前面的所有数字列表(假设在单元格中):

=REGEXEXTRACT(A1, "\d-")

Run Code Online (Sandbox Code Playgroud)

我的问题是它只返回第一场比赛...... 我怎么能得到所有的比赛？

示例文字:

"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"

Run Code Online (Sandbox Code Playgroud)

我的公式返回1-,而我想得到1-2-2-2-2-2-2-2-2-2-3-3-(作为数组或连接文本).

我知道我可以使用脚本或其他函数(如SPLIT)来实现所需的结果,但我真正想知道的是如何在REGEX.*"Google表格公式"中获得re2正则表达式以返回这样的多个匹配.有点像" G ^ -叶形不要返回第一场比赛后,在"选项regex101.com

我也尝试删除不需要的文本REGEXREPLACE,但没有成功(我无法摆脱不在连字符之前的其他数字).

任何帮助赞赏!谢谢 :)

Answer 1

Max*_*rov 7

编辑

\n

我想出了更通用的解决方案：

\n

=regexreplace(A1,"(.)?(\\d-)|(.)","$2")

\n

(\\d-)它仅用第二组替换除第二组匹配之外的任何文本$2。

\n

"(.)?(\\d-)|(.)"\n  1    2    3  \n  Groups are in ()\n  ---------------------------------------\n "$2" -- means return the group number 2\n

Run Code Online (Sandbox Code Playgroud)\n

学习正则表达式： https: //regexone.com

\n

\n
试试这个公式：
\n
=regexreplace(regexreplace(A1,"[^\\-0-9]",""),"(\\d-)|(.)","$1")
\n
它将像这样处理字符串：
\n
"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-G\xc3\xa9n\xc3\xa9ti***566*9q"
\n
与输出：
\n
1-2-2-2-3-
\n

\n

为什么要捕获组 1 和组 3？较短：`=regexreplace(A1,".?(\d-)|.", "$1")` (2认同)

Answer 2

Aur*_*ann 6

实际上，您可以使用regexreplace在单个公式中执行此操作，以用捕获组将所有值括起来，而不是替换文本：

=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))

Run Code Online (Sandbox Code Playgroud)

基本上，它的工作是\d-使用“捕获组” 包围的所有实例，然后使用正则表达式提取，它巧妙地返回所有捕获。如果您想将其重新连接为单个字符串，则可以使用join将其重新包装为单个单元格：

Answer 3

Wik*_*żew 6

您可以在脚本编辑器中创建自己的自定义函数：

function ExtractAllRegex(input, pattern,groupId) {
  return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]);
}

Run Code Online (Sandbox Code Playgroud)

或者，如果您需要返回单个单元格中的所有匹配项，并使用一些分隔符：

function ExtractAllRegex(input, pattern,groupId,separator) {
  return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}

Run Code Online (Sandbox Code Playgroud)

然后，只需将其称为=ExtractAllRegex(A1, "\d-", 0, ", ").

说明：

input - 当前单元格值
pattern - 正则表达式
groupId - 捕获要提取的组 ID
separator - 用于连接匹配结果的文本。

谢谢，这样就搞定了。即使OP不需要脚本，对于其他偶然发现它的人来说，这似乎也是解决该问题的唯一“真正”解决方案。 (3认同)

Answer 4

Mic*_*thy 5

我无法获得适用于我的案例的公认答案。我想这样做，但需要一个快速的解决方案并采用以下方法：

输入：

1111 days, 123 hours 1234 minutes and 121 seconds

Run Code Online (Sandbox Code Playgroud)

预期输出：

1111 123 1234 121

Run Code Online (Sandbox Code Playgroud)

公式：

=split(REGEXREPLACE(C26,"[a-z,]"," ")," ")

Run Code Online (Sandbox Code Playgroud)

Answer 5

Pet*_*eny 4

最短的正则表达式：

\n

=regexreplace(A1,".?(\\d-)|.", "$1")\n

Run Code Online (Sandbox Code Playgroud)\n

返回1-2-2-2-2-2-2-2-2-2-3-3-值为"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-G\xc3\xa9n\xc3\xa9tiq".

\n

正则表达式的解释：

\n

.?-- 可选字符
(\\d-)-- 使用数字后跟破折号捕获组 1（指定(\\d+-)多个数字）
|-- 逻辑或
.-- 任意字符
替换"$1"仅使用捕获组 1，并丢弃其他任何内容

\n

了解有关正则表达式的更多信息： https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

\n

归档时间：	8 年，10 月前
查看次数：	8640 次
最近记录：	7 年，4 月前