未检测到正则表达式preg_quote符号

Pro*_*f83 35 php regex profanity preg-match

我在数据库中有一个发誓单词的字典,以下作品很棒

preg_match_all("/\b".$f."(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)

$t是输入文本,简单地说,$f = preg_quote("punk"); "punk"是来自数据库字典,所以在循环的这一点上表达式如下

preg_match_all("/\bpunk(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)

preg_quote替换符号,例如.#\\#这样的表达是逃过一劫,但如果词典的检查如."F@CK""A$$"与上述表达式输入字符串没有检测到这些符号,我都a$$f@ck在词典中,但它们不工作.如果我删除preg_quote()单词,则正则表达式无效,因为这些符号不会被转义.

有关如何检测的任何建议"a$$"???

编辑:

所以我想那些没有按预期工作的表达将是例如.

preg_match_all("/\bf\@ck(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)

哪个应该找到f @ ck in$t

更新:

这是我的用法,简单地说; 如果有$m替换它们的匹配"\*\*\*\*",则整个块在循环中通过字典中的每个单词,$f是字典单词并且$t是输入

$f = preg_quote($f);
preg_match_all("/\b$f(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
if (count($m) > 0) {
     $t = preg_replace("/(\b$f(?:ing|er|es|s)?\b)/si","\*\*\*\*\*",$t);
}
Run Code Online (Sandbox Code Playgroud)

更新:看,var_dump:

preg_quote($f) = string(5) "a\$\$"
$t = string(18) "You're such an a$$"
expression = string(29) "/\ba\$\$(?:ing|er|es|s)?\b/si"
Run Code Online (Sandbox Code Playgroud)

更新:仅当单词以符号结尾时才会发生这种情况.我测试过"a$$hole"它很好,但"a$$"不起作用.

另一个更新:试试这个简化版本,$words作为一个转移字典

$words = array("a$$","asshole","a$$hole","f@ck","f#ck","f*ck");
$text = "Input whatever you feel like here eg. a$$";

foreach ($words as $f) {
   $f = preg_quote($f,"/");
   $text = preg_replace("/\b".$f."(?:ing|er|es|s)?\b/si",
                         str_repeat("*",strlen($f)),
                        $t);
}
Run Code Online (Sandbox Code Playgroud)

我希望看到"Input whatever you feel like here eg. \*\*\*"结果.

tch*_*ist 182

无法完成

对不起,但这个"问题"真的无法解决.考虑这些:

  • ꜰᴜᴄᴋ是U + A730.1D1C.1D04.1D0B,"\ N {LATIN LETTER SMALL CAPITAL F}\N {LATIN LETTER SMALL CAPITAL U}\N {LATIN LETTER SMALL CAPITAL C}\N {LATIN LETTER SMALL CAPITAL K}"
  • ᶠᵘᶜᵏ是U + 1DA0.1D58.1D9C.1D4F,"\ N {MODIFIER LETTER SMALL F}\N {MODIFIER LETTER SMALL U}\N {MODIFIER LETTER SMALL C}\N {MODIFIER LETTER SMALL K}"
  •   是U + 1D4BB.1D4CA.1D4B8.1D4C0,"\ N {MATHEMATICAL SCRIPT SMALL F}\N {MATHEMATICAL SCRIPT SMALL U}\N {MATHEMATICAL SCRIPT SMALL C}\N {MATHEMATICAL SCRIPT SMALL K}"
  •   是U + 1D58B.1D59A.1D588.1D590,"\ N {MATHEMATICAL BOLD FRAKTUR SMALL F}\N {MATHEMATICAL BOLD FRAKTUR SMALL U}\N {MATHEMATICAL BOLD FRAKTUR SMALL C}\N {MATHEMATICAL BOLD FRAKTUR SMALL K}
  •   是U + 1D4D5.1D4B0.1D49E.1D4A6,"\ N {MATHEMATICAL BOLD SCRIPT CAPITAL F}\N {MATHEMATICAL SCRIPT CAPITAL U}\N {MATHEMATICAL SCRIPT CAPITAL C}\N {MATHEMATICAL SCRIPT CAPITAL K}"
  • ⓕⓕⓤⓒ是U + 24D5.24E4.24D2.24DA,"\ N {CIRCLED LATIN SMALL LETTER F}\N {CIRCLED LATIN SMALL LETTER U}\N {CIRCLED LATIN SMALL LETTER C}\N {CIRCLED LATIN SMALL LALLTER ķ}"
  • Γ̵̵是U + 393.335.10335.13DF.13E6,"\ N {GREEK CAPITAL LETTER GAMMA}\N {COMBINING SHORT STROKE OVERLAY}\N {GOTHIC LETTER QAIRTHRA}\N {CHEROKEE LETTER TLI}\N {CHEROKEE LETTER TSO}"
  • ƒμɕѤ是U + 192.3BC.255.464,"\ N {LATIN SMALL LETTER F WITH HOOK}\N {GREEK SMALL LETTER MU}\N {LATIN SMALL LETTER C WITH CURL}\N {CYRILLIC CAPITAL LETTER IOTIFIED E}"
  • Г̵ЦСК是U + 413.335.426.421.41A,"\ N {CYRILLIC CAPITAL LETTER GHE}\N {COMBINING SHORT STROKE OVERLAY}\N {CYRILLIC CAPITAL LETTER TSE}\N {CYRILLIC CAPITAL LETTER ES}\N {CYRILLIC CAPITAL LETTER KA }"
  • ғᵾȼƙ是U + 493.1D7E.23C.199,"\ N {CYRILLIC SMALL LETTER GHE WITH STROKE}\N {LATIN SMALL CAPITAL LUSTER U WITH STROKE}\N {LATIN SMALL LETTER C WITH STROKE}\N {LATIN SMALL LETTER K与HOOK}"
  • ϜυϚΚ是U + 3DC.3C5.3DA.39A,"\ N {GREEK LETTER DIGAMMA}\N {GREEK SMALL LETTER UPSILON}\N {GREEK LETTER STIGMA}\N {GREEK CAPITAL LETTER KAPPA}"
  • ЖↃUᆿ是U + 416.2183.55.11BF,"\ N {CYRILLIC CAPITAL LETTER ZHE}\N {ROMAN NUMERAL REVERSED ONE HUNDRED}\N {LATIN CAPITAL LETTER U}\N {HANGUL JONGSEONG KHIEUKH}"
  • ʞɔnɟ是U + 29E.254.6E.25F,"\ N {LATIN SMALL LURNTER TURNED K}\N {LATIN SMALL LETTER OPEN O}\N {LATIN SMALL LETTER N}\N {LATIN SMALL LETTER DOTLESS J WITH STROKE}"

它变得更糟

如果您认为这些很容易,请尝试应对所有这些:

 00ↃↃ,FᵾᵾK,Kⓒⓒ⒡,K,ғ∞Ϛk,fᏟK,ⓕoɔⓚ,ɟ⒰¢K,ȼ,Ùȼ⒦,f⒞⒞,Fᶜ,F∞ Ж,@Ꮯ,ɟɟ,FЦ¢,f ooᏟʞ,oo¢Ж,υᶜΚ,Ϝú*ʞ,ꜰcK,ƒᵘᵘk,Uȼ,Жɔμƒ,Fⓤⓤ k,ƒCƙ,ғ00ɔɔ,ƒUcᴋ,∞ᏦᏦ,ꜰꜰᴄ,⒰⒰Ꮯ,ꜰꜰᴜ,Fʞ,f 00,ғuСK,fɔΚ,fμↃ K,ɟcʞ,fↃ,Fμ¢,ᆿᴄ⒦,Κ¢ooɟ,ᶠμᶜᶜ,ᶠᶠⓤᏟ,⒞⒞F,F @Cⓚ,ѤѤuF,⒡⒡Ck ,ƒμᶜᶜ,F C,fᵘ¢ᵏ,ᆿ00,ꜰυȼK,ϜϜК,ooɕᴋ,ғᏟᴋ,ꜰnK,ꜰμϚК,F∞ȼ,⒡⒡Κ ,ƒ⒞,ᶠUCᏦ,ᶠυↃƙ,C,ϜUѤ,ϜUↃ,U⒞ᵏ,F @CК,ғᴜᴋ,⒡UК,ɟU*ᵏ,ccΚ,ғ UↃ,ƒ⒰⒰,ғ*K,nⓚ,ᶠ00СК,Цk,ƙcᶠ,,⒰Ѥ,ꜰǔᴄ⒦,FↃ,ꜰꜰ,*ᵏ,00Ж,ΚC,ᶠUСK,ꜰΚ,ɟUᶜⓚ,∞ȼᴋ,ƒUКć,ƒυȼȼ,⒡∞Жɕ,ᵘᵘ,FUϚϚ ,ⓕЖ,Ↄ,Ϝn*K,oocⓚ,ƒU¢ʞ,ƒuCʞ,K¢μ⒡,ɟ⒰Kɔ,F U ck,FЦⓚ,Uᴋɔ,Ꮯ,ⓚ ,ⓕCК,ɟɟ*⒦,ᶠᵘ⒞⒦,ƒ⒰ᴄᵏ,⒡⒰СK,⒰*ᴋ,ᆿ∞ʞɕ,n*Ѥ,Ϝμᴄ,kćᵘƒ,ᵘɕ, ɟЦᏦᏦ,ᵾ⒞ᵏ,ғᵘᵏ,ᵾ*Ѥ,FᏟK,ғғⓤ,ƒuɕ,ƙc⒰F,ⓒΚ,Kᶜᶜ,ɟc⒦,ƒ@cΚ, ϜЦȼȼ,⒡ᵘ⒦,ɟᵾѤ¢,FↃ,Ϝᴜ,Ϝ⒞,UᏟʞ,ƒυᏟᏟ,FᵾᵾΚ,Ϝᵘⓒʞ,ⓤᶜƙ,ᆿ⒞,f ↃↃ,U K,ϜϜ*,ꜰ@ⓒʞ,ƒuⓒ,fU⒞k,00ᴄᴄ,υСK,Fᴜᴜ,ⓕooↃⓚ,⒡ᵘɕ,ⓕυᴄΚ, ᆿUᏟ,ᏟᏦ,Ć,ЦɕК,f @Ↄⓚ,ᴋᶜUꜰ,ᴜc⒦,FᵘC ,00Ꮶ,ꜰ00К,ϜϜϚ,FcѤ,ⓕoo K,fᵾСᵏ,ⓕⓕc,cЖ,ⓕⓕ,ⓚCnғ,ɟUȼ,00Kȼ,ᴄ ,ЦC,Ц¢,Ϝᵘck,⒡¢k,ƒⓤⓚↃ,k,ƒUↃK,ᴄᴄ,ᆿᆿⓤ,ЖɔU,ƒυ*ᴋ,ƒk,UС⒦, CЖ,ƒμᏟᏟ,ⓕnᴄᴄ,ⓕμⓒⓒ,⒡00ɕ,ᴜᴜ,ᆿᆿЖ,⒦⒦U,kCⓤⓤ,Ϝnȼȼ,ᴋᴋȼᵾ,Fȼ Ѥ,ғ⒰ȼ,f U⒞,, FῠF,FΚ,F 00ȼ,ꜰμϚᏦ,ᆿK,⒡nↃЖ,F @ƙ,ᶠᶠК,UCᵏ,FU⒦ ,00Ↄ,ᶠcК,ғғ,ⓤΚ,UЖ,⒡ɔᏦ,ⓚⓚf,U C K,F @CѤ,ғғСk,ɟu*ƙ,ⓕⓕᵾ,00ȼ K,υ,ƒ⒰*ʞ,ⓕUↃЖ,ꜰȼƙ,⒡⒦ꜰ,ꜰᴜЌ,ᆿ⒦⒦,ⓕ@ᴄК,ᶠɔɔᵏ,ƙↃooꜰ,Fᴜ, ⒰Cᵏ,Uƙ,ƒ∞CᏦ,⒰*K,uↃᴋ,ᆿUⓒ,ᆿUᏦ,n,ƒЦCƙ,⒦ ꜰ,Kᵘf,⒰Ꮶ,ᴄ00,ϜU k,u¢⒦,*Ѥ,ƒСᴋ,CᏦ,@Κ,ʞСᶠ,ᵾϚᏦ,ᶠ⒰ɔ,F⒞⒞ʞ ,⒡⒡Кɔ,ɟ¢,ѤȼUᆿ,ᴜↃʞ,ғ*K,ᴄᴄ,Fʞ,@ȼ,⒰*,ᵾȼ,FѤѤ,ꜰⓤƙϚ,ⓕ00c ʞ,00ϚK,υↃΚ,ꜰμⓒЖ,ᵘϚʞ,ϜᵘↃᵏ,⒡ᵾᏟ,Ϝ⒰ȼѤ,ƒnѤ,ᆿμⓒk,ЦɕΚ,ғμѤ, fⓤⓤ,ᵏμƒ,ᵏС,ᆿ∞,ғғᵘ,ƒμↃk,f ooKȼ,ɟС,ꜰnK,00ᵏ,ᶠμⓒ,c∞Ϝ,ᆿЦĆĆ ,ᵘᴄ,F 00ⓚ,ᶠ@ȼК,......

而且这还不是全部:至少有一个bazingatillion来自那些来自.你现在看到为什么从根本上无法做到这一点?

全面披露

因为我不相信通过默默无闻的安全性,所以这是产生所有这些的程序:

#!/usr/bin/env perl
#
# unifuck - print infinite permutations of fuck in unicode aliases
#
# Tom Christiansen <tchrist@perl.com>
# Mon May 23 09:37:27 MDT 2011

use strict;
use warnings;
use charnames ":full";

use Unicode::Normalize;

binmode(STDOUT, ":utf8");

our(@diddle, @fuck, %fuck); # initted down below
while (my($f,$u,$c,$k) = splice(@fuck, 0, 4)) {
    $fuck{F}{$f}++;
    $fuck{U}{$u}++;
    $fuck{C}{$c}++;
    $fuck{K}{$k}++;
} 

my @F = keys %{ $fuck{F} };
my @U = keys %{ $fuck{U} };
my @C = keys %{ $fuck{C} };
my @K = keys %{ $fuck{K} };

while (1) { 
    my $f = $F[rand @F];
    my $u = $U[rand @U];
    my $c = $C[rand @C];
    my $k = $K[rand @K];

    for ($f,$u,$c,$k) {  
        next if length > 1;
        next if /\p{EA=W}/;
        next if /\pM/;
        next if /\p{InEnclosedAlphanumerics}/;
        s/$/$diddle[rand @diddle]/          if rand(100) < 15;
        s/$/\N{COMBINING ENCLOSING KEYCAP}/ if rand(100) <  1;
    }

    if    (             0) {                                       }
    elsif (rand(100) <  5) {     $u        = q(@)                  } 
    elsif (rand(100) <  5) {        $c     = q(*)                  } 
    elsif (rand(100) < 10) {       ($c,$k) = ($k,$c)               } 
    elsif (rand(100) < 15) { ($f,$u,$c,$k) = reverse ($f,$u,$c,$k) }

    print NFC("$f $u $c $k\n");
}

BEGIN {

    # ok to have repeats in each position, since they'll be counted only once
    # per unique strings
    @fuck = (

        "\N{LATIN CAPITAL LETTER F}",
        "\N{LATIN CAPITAL LETTER U}",
        "\N{LATIN CAPITAL LETTER C}",
        "\N{LATIN CAPITAL LETTER K}",

        "\N{LATIN SMALL LETTER F}",
        "\N{LATIN SMALL LETTER U}",
        "\N{LATIN SMALL LETTER C}",
        "\N{LATIN SMALL LETTER K}",

        "\N{LATIN SMALL LETTER F}",
        "\N{INFINITY}",
        "\N{LATIN SMALL LETTER C}",
        "\N{LATIN SMALL LETTER K}",

        "\N{LATIN SMALL LETTER F}",
        "\N{LATIN SMALL LETTER O}\N{LATIN SMALL LETTER O}",
        "\N{LATIN SMALL LETTER C}",
        "\N{KELVIN SIGN}",

        "\N{LATIN SMALL LETTER F}",
        "\N{DIGIT ZERO}\N{DIGIT ZERO}",
        "\N{CENT SIGN}",
        "\N{LATIN CAPITAL LETTER K}",

        "\N{LATIN LETTER SMALL CAPITAL F}",
        "\N{LATIN LETTER SMALL CAPITAL U}",
        "\N{LATIN LETTER SMALL CAPITAL C}",
        "\N{LATIN LETTER SMALL CAPITAL K}",

        "\N{MODIFIER LETTER SMALL F}",
        "\N{MODIFIER LETTER SMALL U}",
        "\N{MODIFIER LETTER SMALL C}",
        "\N{MODIFIER LETTER SMALL K}",

        "\N{MATHEMATICAL SCRIPT SMALL F}",
        "\N{MATHEMATICAL SCRIPT SMALL U}",
        "\N{MATHEMATICAL SCRIPT SMALL C}",
        "\N{MATHEMATICAL SCRIPT SMALL K}",

        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL F}",
        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL U}",
        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL C}",
        "\N{MATHEMATICAL BOLD FRAKTUR CAPITAL K}",

        "\N{MATHEMATICAL BOLD FRAKTUR SMALL F}",
        "\N{MATHEMATICAL BOLD FRAKTUR SMALL U}",
        "\N{MATHEMATICAL BOLD FRAKTUR SMALL C}",
        "\N{MATHEMATICAL BOLD FRAKTUR SMALL K}",

        "\N{MATHEMATICAL BOLD SCRIPT CAPITAL F}",
        "\N{MATHEMATICAL SCRIPT CAPITAL U}",
        "\N{MATHEMATICAL SCRIPT CAPITAL C}",
        "\N{MATHEMATICAL SCRIPT CAPITAL K}",

        "\N{CIRCLED LATIN SMALL LETTER F}",
        "\N{CIRCLED LATIN SMALL LETTER U}",
        "\N{CIRCLED LATIN SMALL LETTER C}",
        "\N{CIRCLED LATIN SMALL LETTER K}",

        "\N{PARENTHESIZED LATIN SMALL LETTER F}",
        "\N{PARENTHESIZED LATIN SMALL LETTER U}",
        "\N{PARENTHESIZED LATIN SMALL LETTER C}",
        "\N{PARENTHESIZED LATIN SMALL LETTER K}",

        "\N{GREEK CAPITAL LETTER GAMMA}\N{COMBINING SHORT STROKE OVERLAY}",
        "\N{GOTHIC LETTER QAIRTHRA}",
        "\N{CHEROKEE LETTER TLI}",
        "\N{CHEROKEE LETTER TSO}",

        "\N{LATIN SMALL LETTER F WITH HOOK}",
        "\N{GREEK SMALL LETTER MU}",
        "\N{LATIN SMALL LETTER C WITH CURL}",
        "\N{CYRILLIC CAPITAL LETTER IOTIFIED E}",

        "\N{CYRILLIC CAPITAL LETTER GHE}\N{COMBINING SHORT STROKE OVERLAY}",
        "\N{CYRILLIC CAPITAL LETTER TSE}",
        "\N{CYRILLIC CAPITAL LETTER ES}",
        "\N{CYRILLIC CAPITAL LETTER KA}",

        "\N{CYRILLIC SMALL LETTER GHE WITH STROKE}",
        "\N{LATIN SMALL CAPITAL LETTER U WITH STROKE}",
        "\N{LATIN SMALL LETTER C WITH STROKE}",
        "\N{LATIN SMALL LETTER K WITH HOOK}",

        "\N{GREEK LETTER DIGAMMA}",
        "\N{GREEK SMALL LETTER UPSILON}",
        "\N{GREEK LETTER STIGMA}",
        "\N{GREEK CAPITAL LETTER KAPPA}",

        "\N{HANGUL JONGSEONG KHIEUKH}",
        "\N{LATIN CAPITAL LETTER U}",
        "\N{ROMAN NUMERAL REVERSED ONE HUNDRED}",
        "\N{CYRILLIC CAPITAL LETTER ZHE}",

        "\N{LATIN SMALL LETTER DOTLESS J WITH STROKE}",
        "\N{LATIN SMALL LETTER N}",
        "\N{LATIN SMALL LETTER OPEN O}",
        "\N{LATIN SMALL LETTER TURNED K}",

        "\N{FULLWIDTH LATIN CAPITAL LETTER F}",
        "\N{FULLWIDTH LATIN CAPITAL LETTER U}",
        "\N{FULLWIDTH LATIN CAPITAL LETTER C}",
        "\N{FULLWIDTH LATIN CAPITAL LETTER K}",

    );

    @diddle = (
        "\N{COMBINING GRAVE ACCENT}",
        "\N{COMBINING ACUTE ACCENT}",
        "\N{COMBINING CIRCUMFLEX ACCENT}",
        "\N{COMBINING TILDE}",
        "\N{COMBINING BREVE}",
        "\N{COMBINING DOT ABOVE}",
        "\N{COMBINING DIAERESIS}",
        "\N{COMBINING CARON}",
        "\N{COMBINING CANDRABINDU}",
        "\N{COMBINING INVERTED BREVE}",
        "\N{COMBINING GRAVE TONE MARK}",
        "\N{COMBINING ACUTE TONE MARK}",
        "\N{COMBINING GREEK PERISPOMENI}",
        "\N{COMBINING FERMATA}",
        "\N{COMBINING SUSPENSION MARK}",
    );

}
Run Code Online (Sandbox Code Playgroud)

  • 我记得几年前使用各种各样的Unicode技巧绕过亵渎过滤器,并且无论如何都被禁止了.美好时光. (40认同)
  • 现在,如果只有SO的神会阅读并理解这个答案,并停止愚蠢的审查. (16认同)
  • 我不给...你已经给了他们所有人. (13认同)
  • 单独为脚本名称+1.对于这个伟大的答案,还有更多想象中的+ 1. (7认同)
  • @sbi很好,在那之前,当我们的问题确实需要时,我们仍然可以使用我们的西里尔字母.在这里,拿一个:рҏѓґоꙑӏꙑӏꙑӏ; р,о,е看起来与他们的拉丁同性恋格言都不同. (3认同)
  • @cHao实际上,鉴于[Unicode技术报告#36:“ Unicode安全注意事项”]中的* confusables.txt,confusablesSummary.txt *和* confusablesWholeScript.txt *存在,这并不像您想的那么难。 //www.unicode.org/reports/tr36/)。 (2认同)

Sla*_*ava 2

现在,当你说它在单词末尾不起作用时,我看到了问题。$@或任何其他此类特殊字符不是单词的一部分(因此,\b如果输入字符串中后面没有任何其他字母,则在“a$$”的情况下,会在“a”之后中断单词)。我建议使用[^a-z]标记单词结尾来修复它。

preg_match_all("/\b".$f."(?:ing|er|es|s)?[^a-z]/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)