Pro*_*f83 35 php regex profanity preg-match
我在数据库中有一个发誓单词的字典,以下作品很棒
preg_match_all("/\b".$f."(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)
$t是输入文本,简单地说,$f = preg_quote("punk"); "punk"是来自数据库字典,所以在循环的这一点上表达式如下
preg_match_all("/\bpunk(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)
preg_quote替换符号,例如.#用\\#这样的表达是逃过一劫,但如果词典的检查如."F@CK"或"A$$"与上述表达式输入字符串没有检测到这些符号,我都a$$和f@ck在词典中,但它们不工作.如果我删除preg_quote()单词,则正则表达式无效,因为这些符号不会被转义.
有关如何检测的任何建议"a$$"???
编辑:
所以我想那些没有按预期工作的表达将是例如.
preg_match_all("/\bf\@ck(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)
哪个应该找到f @ ck in$t
更新:
这是我的用法,简单地说; 如果有$m替换它们的匹配"\*\*\*\*",则整个块在循环中通过字典中的每个单词,$f是字典单词并且$t是输入
$f = preg_quote($f);
preg_match_all("/\b$f(?:ing|er|es|s)?\b/si",$t,$m,PREG_SET_ORDER);
if (count($m) > 0) {
$t = preg_replace("/(\b$f(?:ing|er|es|s)?\b)/si","\*\*\*\*\*",$t);
}
Run Code Online (Sandbox Code Playgroud)
更新:看,var_dump:
preg_quote($f) = string(5) "a\$\$"
$t = string(18) "You're such an a$$"
expression = string(29) "/\ba\$\$(?:ing|er|es|s)?\b/si"
Run Code Online (Sandbox Code Playgroud)
更新:仅当单词以符号结尾时才会发生这种情况.我测试过"a$$hole"它很好,但"a$$"不起作用.
另一个更新:试试这个简化版本,$words作为一个转移字典
$words = array("a$$","asshole","a$$hole","f@ck","f#ck","f*ck");
$text = "Input whatever you feel like here eg. a$$";
foreach ($words as $f) {
$f = preg_quote($f,"/");
$text = preg_replace("/\b".$f."(?:ing|er|es|s)?\b/si",
str_repeat("*",strlen($f)),
$t);
}
Run Code Online (Sandbox Code Playgroud)
我希望看到"Input whatever you feel like here eg. \*\*\*"结果.
tch*_*ist 182
对不起,但这个"问题"真的无法解决.考虑这些:
如果您认为这些很容易,请尝试应对所有这些:
00ↃↃ,FᵾᵾK,Kⓒⓒ⒡,K,ғ∞Ϛk,fᏟK,ⓕoɔⓚ,ɟ⒰¢K,ȼ,Ùȼ⒦,f⒞⒞,Fᶜ,F∞ Ж,@Ꮯ,ɟɟ,FЦ¢,f ooᏟʞ,oo¢Ж,υᶜΚ,Ϝú*ʞ,ꜰcK,ƒᵘᵘk,Uȼ,Жɔμƒ,Fⓤⓤ k,ƒCƙ,ғ00ɔɔ,ƒUcᴋ,∞ᏦᏦ,ꜰꜰᴄ,⒰⒰Ꮯ,ꜰꜰᴜ,Fʞ,f 00,ғuСK,fɔΚ,fμↃ K,ɟcʞ,fↃ,Fμ¢,ᆿᴄ⒦,Κ¢ooɟ,ᶠμᶜᶜ,ᶠᶠⓤᏟ,⒞⒞F,F @Cⓚ,ѤѤuF,⒡⒡Ck ,ƒμᶜᶜ,F C,fᵘ¢ᵏ,ᆿ00,ꜰυȼK,ϜϜК,ooɕᴋ,ғᏟᴋ,ꜰnK,ꜰμϚК,F∞ȼ,⒡⒡Κ ,ƒ⒞,ᶠUCᏦ,ᶠυↃƙ,C,ϜUѤ,ϜUↃ,U⒞ᵏ,F @CК,ғᴜᴋ,⒡UК,ɟU*ᵏ,ccΚ,ғ UↃ,ƒ⒰⒰,ғ*K,nⓚ,ᶠ00СК,Цk,ƙcᶠ,,⒰Ѥ,ꜰǔᴄ⒦,FↃ,ꜰꜰ,*ᵏ,00Ж,ΚC,ᶠUСK,ꜰΚ,ɟUᶜⓚ,∞ȼᴋ,ƒUКć,ƒυȼȼ,⒡∞Жɕ,ᵘᵘ,FUϚϚ ,ⓕЖ,Ↄ,Ϝn*K,oocⓚ,ƒU¢ʞ,ƒuCʞ,K¢μ⒡,ɟ⒰Kɔ,F U ck,FЦⓚ,Uᴋɔ,Ꮯ,ⓚ ,ⓕCК,ɟɟ*⒦,ᶠᵘ⒞⒦,ƒ⒰ᴄᵏ,⒡⒰СK,⒰*ᴋ,ᆿ∞ʞɕ,n*Ѥ,Ϝμᴄ,kćᵘƒ,ᵘɕ, ɟЦᏦᏦ,ᵾ⒞ᵏ,ғᵘᵏ,ᵾ*Ѥ,FᏟK,ғғⓤ,ƒuɕ,ƙc⒰F,ⓒΚ,Kᶜᶜ,ɟc⒦,ƒ@cΚ, ϜЦȼȼ,⒡ᵘ⒦,ɟᵾѤ¢,FↃ,Ϝᴜ,Ϝ⒞,UᏟʞ,ƒυᏟᏟ,FᵾᵾΚ,Ϝᵘⓒʞ,ⓤᶜƙ,ᆿ⒞,f ↃↃ,U K,ϜϜ*,ꜰ@ⓒʞ,ƒuⓒ,fU⒞k,00ᴄᴄ,υСK,Fᴜᴜ,ⓕooↃⓚ,⒡ᵘɕ,ⓕυᴄΚ, ᆿUᏟ,ᏟᏦ,Ć,ЦɕК,f @Ↄⓚ,ᴋᶜUꜰ,ᴜc⒦,FᵘC ,00Ꮶ,ꜰ00К,ϜϜϚ,FcѤ,ⓕoo K,fᵾСᵏ,ⓕⓕc,cЖ,ⓕⓕ,ⓚCnғ,ɟUȼ,00Kȼ,ᴄ ,ЦC,Ц¢,Ϝᵘck,⒡¢k,ƒⓤⓚↃ,k,ƒUↃK,ᴄᴄ,ᆿᆿⓤ,ЖɔU,ƒυ*ᴋ,ƒk,UС⒦, CЖ,ƒμᏟᏟ,ⓕnᴄᴄ,ⓕμⓒⓒ,⒡00ɕ,ᴜᴜ,ᆿᆿЖ,⒦⒦U,kCⓤⓤ,Ϝnȼȼ,ᴋᴋȼᵾ,Fȼ Ѥ,ғ⒰ȼ,f U⒞,, FῠF,FΚ,F 00ȼ,ꜰμϚᏦ,ᆿK,⒡nↃЖ,F @ƙ,ᶠᶠК,UCᵏ,FU⒦ ,00Ↄ,ᶠcК,ғғ,ⓤΚ,UЖ,⒡ɔᏦ,ⓚⓚf,U C K,F @CѤ,ғғСk,ɟu*ƙ,ⓕⓕᵾ,00ȼ K,υ,ƒ⒰*ʞ,ⓕUↃЖ,ꜰȼƙ,⒡⒦ꜰ,ꜰᴜЌ,ᆿ⒦⒦,ⓕ@ᴄК,ᶠɔɔᵏ,ƙↃooꜰ,Fᴜ, ⒰Cᵏ,Uƙ,ƒ∞CᏦ,⒰*K,uↃᴋ,ᆿUⓒ,ᆿUᏦ,n,ƒЦCƙ,⒦ ꜰ,Kᵘf,⒰Ꮶ,ᴄ00,ϜU k,u¢⒦,*Ѥ,ƒСᴋ,CᏦ,@Κ,ʞСᶠ,ᵾϚᏦ,ᶠ⒰ɔ,F⒞⒞ʞ ,⒡⒡Кɔ,ɟ¢,ѤȼUᆿ,ᴜↃʞ,ғ*K,ᴄᴄ,Fʞ,@ȼ,⒰*,ᵾȼ,FѤѤ,ꜰⓤƙϚ,ⓕ00c ʞ,00ϚK,υↃΚ,ꜰμⓒЖ,ᵘϚʞ,ϜᵘↃᵏ,⒡ᵾᏟ,Ϝ⒰ȼѤ,ƒnѤ,ᆿμⓒk,ЦɕΚ,ғμѤ, fⓤⓤ,ᵏμƒ,ᵏС,ᆿ∞,ғғᵘ,ƒμↃk,f ooKȼ,ɟС,ꜰnK,00ᵏ,ᶠμⓒ,c∞Ϝ,ᆿЦĆĆ ,ᵘᴄ,F 00ⓚ,ᶠ@ȼК,......
而且这还不是全部:至少有一个bazingatillion来自那些来自.你现在看到为什么从根本上无法做到这一点?
因为我不相信通过默默无闻的安全性,所以这是产生所有这些的程序:
#!/usr/bin/env perl
#
# unifuck - print infinite permutations of fuck in unicode aliases
#
# Tom Christiansen <tchrist@perl.com>
# Mon May 23 09:37:27 MDT 2011
use strict;
use warnings;
use charnames ":full";
use Unicode::Normalize;
binmode(STDOUT, ":utf8");
our(@diddle, @fuck, %fuck); # initted down below
while (my($f,$u,$c,$k) = splice(@fuck, 0, 4)) {
$fuck{F}{$f}++;
$fuck{U}{$u}++;
$fuck{C}{$c}++;
$fuck{K}{$k}++;
}
my @F = keys %{ $fuck{F} };
my @U = keys %{ $fuck{U} };
my @C = keys %{ $fuck{C} };
my @K = keys %{ $fuck{K} };
while (1) {
my $f = $F[rand @F];
my $u = $U[rand @U];
my $c = $C[rand @C];
my $k = $K[rand @K];
for ($f,$u,$c,$k) {
next if length > 1;
next if /\p{EA=W}/;
next if /\pM/;
next if /\p{InEnclosedAlphanumerics}/;
s/$/$diddle[rand @diddle]/ if rand(100) < 15;
s/$/\N{COMBINING ENCLOSING KEYCAP}/ if rand(100) < 1;
}
if ( 0) { }
elsif (rand(100) < 5) { $u = q(@) }
elsif (rand(100) < 5) { $c = q(*) }
elsif (rand(100) < 10) { ($c,$k) = ($k,$c) }
elsif (rand(100) < 15) { ($f,$u,$c,$k) = reverse ($f,$u,$c,$k) }
print NFC("$f $u $c $k\n");
}
BEGIN {
# ok to have repeats in each position, since they'll be counted only once
# per unique strings
@fuck = (
"\N{LATIN CAPITAL LETTER F}",
"\N{LATIN CAPITAL LETTER U}",
"\N{LATIN CAPITAL LETTER C}",
"\N{LATIN CAPITAL LETTER K}",
"\N{LATIN SMALL LETTER F}",
"\N{LATIN SMALL LETTER U}",
"\N{LATIN SMALL LETTER C}",
"\N{LATIN SMALL LETTER K}",
"\N{LATIN SMALL LETTER F}",
"\N{INFINITY}",
"\N{LATIN SMALL LETTER C}",
"\N{LATIN SMALL LETTER K}",
"\N{LATIN SMALL LETTER F}",
"\N{LATIN SMALL LETTER O}\N{LATIN SMALL LETTER O}",
"\N{LATIN SMALL LETTER C}",
"\N{KELVIN SIGN}",
"\N{LATIN SMALL LETTER F}",
"\N{DIGIT ZERO}\N{DIGIT ZERO}",
"\N{CENT SIGN}",
"\N{LATIN CAPITAL LETTER K}",
"\N{LATIN LETTER SMALL CAPITAL F}",
"\N{LATIN LETTER SMALL CAPITAL U}",
"\N{LATIN LETTER SMALL CAPITAL C}",
"\N{LATIN LETTER SMALL CAPITAL K}",
"\N{MODIFIER LETTER SMALL F}",
"\N{MODIFIER LETTER SMALL U}",
"\N{MODIFIER LETTER SMALL C}",
"\N{MODIFIER LETTER SMALL K}",
"\N{MATHEMATICAL SCRIPT SMALL F}",
"\N{MATHEMATICAL SCRIPT SMALL U}",
"\N{MATHEMATICAL SCRIPT SMALL C}",
"\N{MATHEMATICAL SCRIPT SMALL K}",
"\N{MATHEMATICAL BOLD FRAKTUR CAPITAL F}",
"\N{MATHEMATICAL BOLD FRAKTUR CAPITAL U}",
"\N{MATHEMATICAL BOLD FRAKTUR CAPITAL C}",
"\N{MATHEMATICAL BOLD FRAKTUR CAPITAL K}",
"\N{MATHEMATICAL BOLD FRAKTUR SMALL F}",
"\N{MATHEMATICAL BOLD FRAKTUR SMALL U}",
"\N{MATHEMATICAL BOLD FRAKTUR SMALL C}",
"\N{MATHEMATICAL BOLD FRAKTUR SMALL K}",
"\N{MATHEMATICAL BOLD SCRIPT CAPITAL F}",
"\N{MATHEMATICAL SCRIPT CAPITAL U}",
"\N{MATHEMATICAL SCRIPT CAPITAL C}",
"\N{MATHEMATICAL SCRIPT CAPITAL K}",
"\N{CIRCLED LATIN SMALL LETTER F}",
"\N{CIRCLED LATIN SMALL LETTER U}",
"\N{CIRCLED LATIN SMALL LETTER C}",
"\N{CIRCLED LATIN SMALL LETTER K}",
"\N{PARENTHESIZED LATIN SMALL LETTER F}",
"\N{PARENTHESIZED LATIN SMALL LETTER U}",
"\N{PARENTHESIZED LATIN SMALL LETTER C}",
"\N{PARENTHESIZED LATIN SMALL LETTER K}",
"\N{GREEK CAPITAL LETTER GAMMA}\N{COMBINING SHORT STROKE OVERLAY}",
"\N{GOTHIC LETTER QAIRTHRA}",
"\N{CHEROKEE LETTER TLI}",
"\N{CHEROKEE LETTER TSO}",
"\N{LATIN SMALL LETTER F WITH HOOK}",
"\N{GREEK SMALL LETTER MU}",
"\N{LATIN SMALL LETTER C WITH CURL}",
"\N{CYRILLIC CAPITAL LETTER IOTIFIED E}",
"\N{CYRILLIC CAPITAL LETTER GHE}\N{COMBINING SHORT STROKE OVERLAY}",
"\N{CYRILLIC CAPITAL LETTER TSE}",
"\N{CYRILLIC CAPITAL LETTER ES}",
"\N{CYRILLIC CAPITAL LETTER KA}",
"\N{CYRILLIC SMALL LETTER GHE WITH STROKE}",
"\N{LATIN SMALL CAPITAL LETTER U WITH STROKE}",
"\N{LATIN SMALL LETTER C WITH STROKE}",
"\N{LATIN SMALL LETTER K WITH HOOK}",
"\N{GREEK LETTER DIGAMMA}",
"\N{GREEK SMALL LETTER UPSILON}",
"\N{GREEK LETTER STIGMA}",
"\N{GREEK CAPITAL LETTER KAPPA}",
"\N{HANGUL JONGSEONG KHIEUKH}",
"\N{LATIN CAPITAL LETTER U}",
"\N{ROMAN NUMERAL REVERSED ONE HUNDRED}",
"\N{CYRILLIC CAPITAL LETTER ZHE}",
"\N{LATIN SMALL LETTER DOTLESS J WITH STROKE}",
"\N{LATIN SMALL LETTER N}",
"\N{LATIN SMALL LETTER OPEN O}",
"\N{LATIN SMALL LETTER TURNED K}",
"\N{FULLWIDTH LATIN CAPITAL LETTER F}",
"\N{FULLWIDTH LATIN CAPITAL LETTER U}",
"\N{FULLWIDTH LATIN CAPITAL LETTER C}",
"\N{FULLWIDTH LATIN CAPITAL LETTER K}",
);
@diddle = (
"\N{COMBINING GRAVE ACCENT}",
"\N{COMBINING ACUTE ACCENT}",
"\N{COMBINING CIRCUMFLEX ACCENT}",
"\N{COMBINING TILDE}",
"\N{COMBINING BREVE}",
"\N{COMBINING DOT ABOVE}",
"\N{COMBINING DIAERESIS}",
"\N{COMBINING CARON}",
"\N{COMBINING CANDRABINDU}",
"\N{COMBINING INVERTED BREVE}",
"\N{COMBINING GRAVE TONE MARK}",
"\N{COMBINING ACUTE TONE MARK}",
"\N{COMBINING GREEK PERISPOMENI}",
"\N{COMBINING FERMATA}",
"\N{COMBINING SUSPENSION MARK}",
);
}
Run Code Online (Sandbox Code Playgroud)
现在,当你说它在单词末尾不起作用时,我看到了问题。$@或任何其他此类特殊字符不是单词的一部分(因此,\b如果输入字符串中后面没有任何其他字母,则在“a$$”的情况下,会在“a”之后中断单词)。我建议使用[^a-z]标记单词结尾来修复它。
preg_match_all("/\b".$f."(?:ing|er|es|s)?[^a-z]/si",$t,$m,PREG_SET_ORDER);
Run Code Online (Sandbox Code Playgroud)