在 Perl 正则表达式和 grep 中否定括号内的字符类

Sco*_*tin 6 regex arrays perl

我试图解决一个非常简单的问题 - 在数组中查找仅包含某些字母的字符串。但是,我在正则表达式的行为中遇到了一些问题和/或grep我不明白的问题。

#!/usr/bin/perl

use warnings;
use strict;

my @test_data = qw(ant bee cat dodo elephant frog giraffe horse);

# Words wanted include these letters only. Hardcoded for demonstration purposes
my @wanted_letters = qw/a c d i n o t/;

# Subtract those letters from the alphabet to find the letters to eliminate.
# Interpolate array into a negated bracketed character class, positive grep
# against a list of the lowercase alphabet: fine, gets befghjklmpqrsuvwxyz.
my @unwanted_letters = grep(/[^@wanted_letters]/, ('a' .. 'z'));

# The desired result can be simulated by hardcoding the unwanted letters into a
# bracketed character class then doing a negative grep: matches ant, cat, and dodo.
my @works = grep(!/[befghjklmpqrsuvwxyz]/, @test_data);

# Doing something similar but moving the negation into the bracketed character
# class fails and matches everything.
my @fails1 = grep(/[^befghjklmpqrsuvwxyz]/, @test_data);

# Doing the same thing that produced the array of unwanted letters also fails.
my @fails2 = grep(/[^@unwanted_letters]/, @test_data);

print join ' ', @works; print "\n";
print join ' ', @fails1; print "\n";
print join ' ', @fails2; print "\n";
Run Code Online (Sandbox Code Playgroud)

问题:

  • 为什么@works得到了正确的结果却没有呢@fails1?文档建议前者,而 的grep否定部分则perlrecharclass建议后者,尽管它=~在示例中使用了。这是专门与 using 有关的吗grep
  • 为什么不起作用@fails2?这与数组与列表上下文有关吗?否则它看起来与减法步骤相同。
  • 除此之外,是否有一种纯粹的正则表达式方法可以实现这一点,避免减法步骤?

daw*_*awg 5

两者都fails通过添加锚点^$量词来固定+

这些都有效:

my @fails1 = grep(/^[^befghjklmpqrsuvwxyz]+$/, @test_data);
my @fails2 = grep(/^[^@unwanted_letters]+$/, @test_data);
Run Code Online (Sandbox Code Playgroud)

请记住,/[^befghjklmpqrsuvwxyz]//[^@unwanted_letters]/仅匹配一个字符。添加+意味着尽可能多。添加^and$表示字符串从头到尾的所有字符。

如果/[@wanted_letters]/有一个想要的字符(即使字符串中有不需要的字符),您将返回一个匹配项——逻辑上相当于任何. 与/^[@wanted_letters]+$/所有字母需要位于 集合中的位置进行比较@wanted_letters,并且 相当于all

Demo1只有一个字符,因此grep失败。

Demo2量词意味着多个但没有锚点 - grep 失败

Demo3锚点和量词 - 预期结果。

一旦您了解字符类仅匹配一个字符和整个字符串的锚点以及将匹配扩展到锚点的所有内容的量词,您就可以直接使用想要的字母进行 grep :

my @wanted = grep(/^[@wanted_letters]+$/, @test_data);
Run Code Online (Sandbox Code Playgroud)