Ruby正则表达式:如果冒号位于它们之前,则忽略引号

krn*_*krn 5 ruby regex

我正在尝试编写一个Ruby正则表达式,它可以捕获引用的短语,而不是那些在它们之前有":"的短语.例如:

奥巴马:"是的,我们可以!"

应该被忽略.

我写了一些测试:

http://rubular.com/r/OJmkLd68gc

tch*_*ist 4

编辑:还有更多调整。

\n\n

这适用于 ASCII,具体取决于输入的内容:

\n\n
 (?<! [:\\s] ) \\s* ( ["\'] ) (?: (?! \\1 ) . )+ \\1\n
Run Code Online (Sandbox Code Playgroud)\n\n

对于 \xe2\x80\x9cUnicode \xe2\x80\x98matching\xe2\x80\x99 Quotes\xe2\x80\x9d,你必须在配对中多一点 \xe2\x80\xb9special\xe2\x80\xba,也许沿着这些思路:

\n\n
(?xs) (?<!:) \\s+ \n  (?: ( ["\'] ) (?: (?! \\1 ) . )+ \\1\n    | \xe2\x80\x9c .*? \xe2\x80\x9d    # English etc\n    | \xe2\x80\x98 .*? \xe2\x80\x99   \n    | \xc2\xab .*? \xc2\xbb    # French, Spanish, Italian\n    | \xe2\x80\xb9 .*? \xe2\x80\xba\n    | \xe2\x80\x9e .*? \xe2\x80\x9c    # German, Icelandic, Romanian\n    | \xe2\x80\x9a .*? \xe2\x80\x98\n    | \xe2\x80\x9e .?* \xe2\x80\x9d    # Hungarian\n    | \xe2\x80\x9d .?* \xe2\x80\x9d    # Swedish\n    | \xe2\x80\x99 .?* \xe2\x80\x99    \n    | \xc2\xbb .?* \xc2\xab    # Danish, Hungarian\n    | \xe2\x80\xba .*? \xe2\x80\xb9\n    | \xe3\x80\x8c .*? \xe3\x80\x8d   # Japanese, Chinese\n    | \xe3\x80\x8e .?* \xe3\x80\x8f  \n  )\n
Run Code Online (Sandbox Code Playgroud)\n\n

您可以在此处阅读有关各种语言使用的成对引号类型的更多信息。

\n\n

这里\xe2\x80\x99 是 Perl 中的测试程序,但原则在 Ruby 中应该完全适用:

\n\n
#!/usr/bin/perl\nuse strict;\nuse warnings;\nuse utf8;\nuse open qw[ :std IO :utf8 ];\nwhile (<DATA>) {\n    print if / (?<! [:\\s] ) \\s* ( ["\'] ) (?: (?! \\1 ) . )+ \\1/sx;\n}\n__END__\n"Take off, hoser!"\nDorothy Parker:Brevity is the soul of lingerie.\nDorothy Parker:"Brevity is the soul of lingerie."\nDorothy Parker: "Brevity is the soul of lingerie."\nDorothy Parker:  "Brevity is the soul of lingerie."\nLarry Wall: I don\'t know if it\'s what you want, but it\'s what you get. :-)\nLarry Wall said, "I don\'t know if it\'s what you want, but it\'s what you get. :-)"\nLarry Wall said: \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\nLarry Wall said:   \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\nLarry Wall said, \xe2\x80\x9cI don\'t know if it\'s what you want, but it\'s what you get. :-)\xe2\x80\x9d\nBoss: And what\'s that "goto" doing there?!?\nHacker: Er, I guess my finger slipped when I was typing "getservbyport"...\n\xe2\x80\x98Nevermore!\xe2\x80\x99 quoth the raven.\nQuoth the raven: \xe2\x80\x98Nevermore!\xe2\x80\x99\n\'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\nsrc/perl/mg.c: "I wish I had never come here, and I don\'t want to see no more magic," he said, and fell silent.\nsrc/perl/mg.c: \'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\nsrc/perl/mg.c => "I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent."\n\xe2\x80\x98I wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x99\n\xe2\x80\x9cI wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x9d\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出是

\n\n
"Take off, hoser!"\nLarry Wall: I don\'t know if it\'s what you want, but it\'s what you get. :-)\nLarry Wall said, "I don\'t know if it\'s what you want, but it\'s what you get. :-)"\nLarry Wall said: \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\nLarry Wall said:   \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\nLarry Wall said, \xe2\x80\x9cI don\'t know if it\'s what you want, but it\'s what you get. :-)\xe2\x80\x9d\nBoss: And what\'s that "goto" doing there?!?\nHacker: Er, I guess my finger slipped when I was typing "getservbyport"...\n\'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\nsrc/perl/mg.c: \'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\nsrc/perl/mg.c => "I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent."\n
Run Code Online (Sandbox Code Playgroud)\n\n

这可能看起来 \xe2\x80\x9cwrong\xe2\x80\x9d,但由于内部引号,它是 \xe2\x80\x99s。这是一个更完整的版本,可以更好地说明问题:

\n\n
#!/usr/bin/perl\nuse strict;\nuse warnings;\nuse utf8;\nuse open qw[ :std IO :utf8 ];\nwhile (<DATA>) {\n    chomp;    \n    my $bingo = m{\n        (?<! [:\\s] ) \\s*\n        (?: (?<= ^  )\n          | (?<= \\s )\n        )\n        (?: ( ["\'] ) (?: (?! \\1 ) . )+ \\1\n          | \xe2\x80\x9c .*? \xe2\x80\x9d    # English etc\n          | \xe2\x80\x98 .*? \xe2\x80\x99\n        )\n    }sx;\n\n    if ($bingo) {\n        printf("Line %2d, quote \xe3\x80\x8c%s\xe3\x80\x8d\\n",   $., $&);\n        printf(" " x 7 . "in line \xe3\x80\x8e%s\xe3\x80\x8f\\n", $_);\n    } else {\n        printf("Line %2d IGNORE \xe3\x80\x8e%s\xe3\x80\x8f\\n", $., $_);\n    }    \n}    \n__END__\n"Take off, hoser!"\nDorothy Parker:Brevity is the soul of lingerie.\nDorothy Parker:"Brevity is the soul of lingerie."\nDorothy Parker: "Brevity is the soul of lingerie."\nDorothy Parker:  "Brevity is the soul of lingerie."\nLarry Wall: I don\'t know if it\'s what you want, but it\'s what you get. :-)\nLarry Wall said, "I don\'t know if it\'s what you want, but it\'s what you get. :-)"\nLarry Wall said: \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\nLarry Wall said:   \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\nLarry Wall said, \xe2\x80\x9cI don\'t know if it\'s what you want, but it\'s what you get. :-)\xe2\x80\x9d\nBoss: And what\'s that "goto" doing there?!?\nHacker: Er, I guess my finger slipped when I was typing "getservbyport"...\n\xe2\x80\x98Nevermore!\xe2\x80\x99 quoth the raven.\nQuoth the raven: \xe2\x80\x98Nevermore!\xe2\x80\x99\n\'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\nsrc/perl/mg.c: "I wish I had never come here, and I don\'t want to see no more magic," he said, and fell silent.\nsrc/perl/mg.c: \'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\nsrc/perl/mg.c => "I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent."\n\xe2\x80\x98I wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x99\n\xe2\x80\x9cI wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x9d\n
Run Code Online (Sandbox Code Playgroud)\n\n

其输出是:

\n\n
Line  1, quote \xe3\x80\x8c"Take off, hoser!"\xe3\x80\x8d\n       in line \xe3\x80\x8e"Take off, hoser!"\xe3\x80\x8f\nLine  2 IGNORE \xe3\x80\x8eDorothy Parker:Brevity is the soul of lingerie.\xe3\x80\x8f\nLine  3 IGNORE \xe3\x80\x8eDorothy Parker:"Brevity is the soul of lingerie."\xe3\x80\x8f\nLine  4 IGNORE \xe3\x80\x8eDorothy Parker: "Brevity is the soul of lingerie."\xe3\x80\x8f\nLine  5 IGNORE \xe3\x80\x8eDorothy Parker:  "Brevity is the soul of lingerie."\xe3\x80\x8f\nLine  6 IGNORE \xe3\x80\x8eLarry Wall: I don\'t know if it\'s what you want, but it\'s what you get. :-)\xe3\x80\x8f\nLine  7, quote \xe3\x80\x8c "I don\'t know if it\'s what you want, but it\'s what you get. :-)"\xe3\x80\x8d\n       in line \xe3\x80\x8eLarry Wall said, "I don\'t know if it\'s what you want, but it\'s what you get. :-)"\xe3\x80\x8f\nLine  8 IGNORE \xe3\x80\x8eLarry Wall said: \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\xe3\x80\x8f\nLine  9 IGNORE \xe3\x80\x8eLarry Wall said:   \xe2\x80\x9cI don\'t know if it\'s what you want, but it\xe2\x80\x99s what you get. :-)\xe2\x80\x9d\xe3\x80\x8f\nLine 10, quote \xe3\x80\x8c \xe2\x80\x9cI don\'t know if it\'s what you want, but it\'s what you get. :-)\xe2\x80\x9d\xe3\x80\x8d\n       in line \xe3\x80\x8eLarry Wall said, \xe2\x80\x9cI don\'t know if it\'s what you want, but it\'s what you get. :-)\xe2\x80\x9d\xe3\x80\x8f\nLine 11, quote \xe3\x80\x8c "goto"\xe3\x80\x8d\n       in line \xe3\x80\x8eBoss: And what\'s that "goto" doing there?!?\xe3\x80\x8f\nLine 12, quote \xe3\x80\x8c "getservbyport"\xe3\x80\x8d\n       in line \xe3\x80\x8eHacker: Er, I guess my finger slipped when I was typing "getservbyport"...\xe3\x80\x8f\nLine 13, quote \xe3\x80\x8c\xe2\x80\x98Nevermore!\xe2\x80\x99\xe3\x80\x8d\n       in line \xe3\x80\x8e\xe2\x80\x98Nevermore!\xe2\x80\x99 quoth the raven.\xe3\x80\x8f\nLine 14 IGNORE \xe3\x80\x8eQuoth the raven: \xe2\x80\x98Nevermore!\xe2\x80\x99\xe3\x80\x8f\nLine 15, quote \xe3\x80\x8c\'I wish I had never come here, and I don\'\xe3\x80\x8d\n       in line \xe3\x80\x8e\'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\xe3\x80\x8f\nLine 16 IGNORE \xe3\x80\x8esrc/perl/mg.c: "I wish I had never come here, and I don\'t want to see no more magic," he said, and fell silent.\xe3\x80\x8f\nLine 17 IGNORE \xe3\x80\x8esrc/perl/mg.c: \'I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent.\xe3\x80\x8f\nLine 18, quote \xe3\x80\x8c "I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent."\xe3\x80\x8d\n       in line \xe3\x80\x8esrc/perl/mg.c => "I wish I had never come here, and I don\'t want to see no more magic,\' he said, and fell silent."\xe3\x80\x8f\nLine 19, quote \xe3\x80\x8c\xe2\x80\x98I wish I had never come here, and I don\xe2\x80\x99\xe3\x80\x8d\n       in line \xe3\x80\x8e\xe2\x80\x98I wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x99\xe3\x80\x8f\nLine 20, quote \xe3\x80\x8c\xe2\x80\x9cI wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x9d\xe3\x80\x8d\n       in line \xe3\x80\x8e\xe2\x80\x9cI wish I had never come here, and I don\xe2\x80\x99t want to see no more magic,\xe2\x80\x99 he said, and fell silent.\xe2\x80\x9d\xe3\x80\x8f\n
Run Code Online (Sandbox Code Playgroud)\n\n

另外,还有一个标准的 Unicode 派生属性,简称为\\p{Quotation_Mark}\\p{QMark},但 Ruby 不支持它。您可以使用unichars脚本列出所有这些:

\n\n
$ unichars \'\\p{qmark}\'\n "    34 0022 QUOTATION MARK\n \'    39 0027 APOSTROPHE\n \xc2\xab   171 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK\n \xc2\xbb   187 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK\n \xe2\x80\x98  8216 2018 LEFT SINGLE QUOTATION MARK\n \xe2\x80\x99  8217 2019 RIGHT SINGLE QUOTATION MARK\n \xe2\x80\x9a  8218 201A SINGLE LOW-9 QUOTATION MARK\n \xe2\x80\x9b  8219 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK\n \xe2\x80\x9c  8220 201C LEFT DOUBLE QUOTATION MARK\n \xe2\x80\x9d  8221 201D RIGHT DOUBLE QUOTATION MARK\n \xe2\x80\x9e  8222 201E DOUBLE LOW-9 QUOTATION MARK\n \xe2\x80\x9f  8223 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK\n \xe2\x80\xb9  8249 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK\n \xe2\x80\xba  8250 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK\n \xe3\x80\x8c 12300 300C LEFT CORNER BRACKET\n \xe3\x80\x8d 12301 300D RIGHT CORNER BRACKET\n \xe3\x80\x8e 12302 300E LEFT WHITE CORNER BRACKET\n \xe3\x80\x8f 12303 300F RIGHT WHITE CORNER BRACKET\n \xe3\x80\x9d 12317 301D REVERSED DOUBLE PRIME QUOTATION MARK\n \xe3\x80\x9e 12318 301E DOUBLE PRIME QUOTATION MARK\n \xe3\x80\x9f 12319 301F LOW DOUBLE PRIME QUOTATION MARK\n \xef\xb9\x81 65089 FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET\n \xef\xb9\x82 65090 FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET\n \xef\xb9\x83 65091 FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET\n \xef\xb9\x84 65092 FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET\n \xef\xbc\x82 65282 FF02 FULLWIDTH QUOTATION MARK\n \xef\xbc\x87 65287 FF07 FULLWIDTH APOSTROPHE\n \xef\xbd\xa2 65378 FF62 HALFWIDTH LEFT CORNER BRACKET\n \xef\xbd\xa3 65379 FF63 HALFWIDTH RIGHT CORNER BRACKET\n
Run Code Online (Sandbox Code Playgroud)\n\n

您可以使用uniprops脚本列出所有代码点\xe2\x80\x99s属性

\n\n
$ uniprops -a 2018\nU+2018 \xe2\x80\xb9\xe2\x80\x98\xe2\x80\xba \\N{ LEFT SINGLE QUOTATION MARK }:\n    \\pP \\p{Pi}\n    All Any Assigned InGeneralPunctuation Case_Ignorable CI Common Zyyy Pi P General_Punctuation Gr_Base Grapheme_Base Graph GrBase Initial_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn Print Punctuation QMark Quotation_Mark X_POSIX_Graph X_POSIX_Print X_POSIX_Punct\n    Age=1.1 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=General_Punctuation Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=None DT=None East_Asian_Width=A East_Asian_Width=Ambiguous EA=A Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=QU Line_Break=Quotation LB=QU Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=CL Sentence_Break=Close SB=CL Word_Break=MB Word_Break=MidNumLet WB=MB _Case_Ignorable _X_Begin\n
Run Code Online (Sandbox Code Playgroud)\n