用" "替换任何标签内容中的所有空格

Cha*_*hak 5 perl mojo-dom

任务

替换任何标记内容中的所有空格 .

y.html(示例文件)

<p class=MsoNormal style='margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt;
margin-left:34.0pt;text-indent:-19.8pt'><span lang=NL-BE style='font-size:10.0pt;
font-family:Symbol;color:black;mso-ansi-language:NL-BE'>·</span><span
class=GramE><span style='font-size:7.0pt;color:black'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span><span style='font-size:10.0pt;font-family:Arial;color:black'>Kit</span></span><span
style='font-size:10.0pt;font-family:Arial;color:black'> </span><span
class=SpellE><i><span style='font-size:10.0pt;font-family:Arial'>Strongyloides</span></i></span><i><span
style='font-size:10.0pt;font-family:Arial'> <span class=SpellE>ratti</span></span></i><span
style='font-size:10.0pt;font-family:Arial'> (nr. 9450) van <span class=SpellE>Bordier</span>
Affinity Products. </span><span lang=NL-BE style='font-size:10.0pt;font-family:
Arial;mso-ansi-language:NL-BE'>Zie bijsluiter in bijlage: CLKB_B_0306. Te
bewaren bij 2 – 8 °C tot vervaldatum.</span><span lang=NL-BE style='mso-ansi-language:
NL-BE'><o:p></o:p></span></p>
Run Code Online (Sandbox Code Playgroud)

我尝试了什么

#!/usr/bin/perl
use strict;
use warnings;
use Mojo::DOM;
open (my $fh, "<", "y.html") or die $!;
my $dom = Mojo::DOM->new(do{local $/ = undef; <$fh>});
$dom->find("*")->each( sub { $_->content( $_->content =~ s/\s/\&nbsp;/gr ) } );
print $dom;
Run Code Online (Sandbox Code Playgroud)

上面脚本的结果

<p class="MsoNormal" style="margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt;
margin-left:34.0pt;text-indent:-19.8pt"><span&nbsp;lang="nl-be"&nbsp;style="font-size:10.0pt;&nbsp;font-family:symbol;color:black;mso-ansi-language:nl-be">·<span&nbsp;class="grame"><span&nbsp;style="font-s
ize:7.0pt;color:black">         <span&nbsp;style="font-size:10.0pt;font-family:arial;color:black">Kit<span&nbsp;style="font-size:10.0pt;font-family:arial;color:black"> <span&nbsp;class="spelle"><i><span&nb
sp;style="font-size:10.0pt;font-family:arial">Strongyloides<i><span&nbsp;style="font-size:10.0pt;font-family:arial"> <span&nbsp;class="spelle">ratti<span&nbsp;style="font-size:10.0pt;font-family:arial"> (n
r. 9450) van <span&nbsp;class="spelle">Bordier Affinity Products. <span&nbsp;lang="nl-be"&nbsp;style="font-size:10.0pt;font-family:&nbsp;arial;mso-ansi-language:nl-be">Zie bijsluiter in bijlage: CLKB_B_030
6. Te bewaren bij 2 – 8 °C tot vervaldatum.<span&nbsp;lang="nl-be"&nbsp;style="mso-ansi-language:&nbsp;nl-be"><o:p></o:p></span&nbsp;lang="nl-be"&nbsp;style="mso-ansi-language:&nbsp;nl-be"></span&nbsp;lang
="nl-be"&nbsp;style="font-size:10.0pt;font-family:&nbsp;arial;mso-ansi-language:nl-be"></span&nbsp;class="spelle"></span&nbsp;style="font-size:10.0pt;font-family:arial"></span&nbsp;class="spelle"></span&nb
sp;style="font-size:10.0pt;font-family:arial"></i></span&nbsp;style="font-size:10.0pt;font-family:arial"></i></span&nbsp;class="spelle"></span&nbsp;style="font-size:10.0pt;font-family:arial;color:black"></
span&nbsp;style="font-size:10.0pt;font-family:arial;color:black"></span&nbsp;style="font-size:7.0pt;color:black"></span&nbsp;class="grame"></span&nbsp;lang="nl-be"&nbsp;style="font-size:10.0pt;&nbsp;font-f
amily:symbol;color:black;mso-ansi-language:nl-be"></p>
Run Code Online (Sandbox Code Playgroud)

我没有获得所需的输出,它&nbsp;也添加了标签(例如:) </span&nbsp;,我希望只在内容上完成.

PS:我尝试过Mojo::DOM,但没有必要使用它,你可以尝试任何其他解析器,如果你想,我仍然想知道我的代码有什么问题?

Mil*_*ler 4

这是一项将输入标记化使其更易于使用的工作。因此我建议使用HTML::TokeParser

\n\n
#!/usr/bin/perl\nuse strict;\nuse warnings;\nuse utf8;\n\nuse HTML::TokeParser;\n\nmy $data = do {local $/; <DATA>};\n\nmy $p = HTML::TokeParser->new(\\$data);\n\nwhile (my $token = $p->get_token) {\n    if ($token->[0] eq \'T\') {\n        my $text = $token->[1];\n        $text =~ s/ /&nbsp;/g;\n        print $text;\n    } else {\n        print "$token->[-1]";\n    }\n}\n\n__DATA__\n<html>\n<body>\n<p class=MsoNormal style=\'margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt;\nmargin-left:34.0pt;text-indent:-19.8pt\'><span lang=NL-BE style=\'font-size:10.0pt;\nfont-family:Symbol;color:black;mso-ansi-language:NL-BE\'>\xc2\xb7</span><span\nclass=GramE><span style=\'font-size:7.0pt;color:black\'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n</span><span style=\'font-size:10.0pt;font-family:Arial;color:black\'>Kit</span></span><span\nstyle=\'font-size:10.0pt;font-family:Arial;color:black\'> </span><span\nclass=SpellE><i><span style=\'font-size:10.0pt;font-family:Arial\'>Strongyloides</span></i></span><i><span\nstyle=\'font-size:10.0pt;font-family:Arial\'> <span class=SpellE>ratti</span></span></i><span\nstyle=\'font-size:10.0pt;font-family:Arial\'> (nr. 9450) van <span class=SpellE>Bordier</span>\nAffinity Products. </span><span lang=NL-BE style=\'font-size:10.0pt;font-family:\nArial;mso-ansi-language:NL-BE\'>Zie bijsluiter in bijlage: CLKB_B_0306. Te\nbewaren bij 2 \xe2\x80\x93 8 \xc2\xb0C tot vervaldatum.</span><span lang=NL-BE style=\'mso-ansi-language:\nNL-BE\'><o:p></o:p></span></p>\n</body>\n</html>\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
<html>\n<body>\n<p class=MsoNormal style=\'margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt;\nmargin-left:34.0pt;text-indent:-19.8pt\'><span lang=NL-BE style=\'font-size:10.0pt;\nfont-family:Symbol;color:black;mso-ansi-language:NL-BE\'>\xc2\xb7</span><span\nclass=GramE><span style=\'font-size:7.0pt;color:black\'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n</span><span style=\'font-size:10.0pt;font-family:Arial;color:black\'>Kit</span></span><span\nstyle=\'font-size:10.0pt;font-family:Arial;color:black\'>&nbsp;</span><span\nclass=SpellE><i><span style=\'font-size:10.0pt;font-family:Arial\'>Strongyloides</span></i></span><i><span\nstyle=\'font-size:10.0pt;font-family:Arial\'>&nbsp;<span class=SpellE>ratti</span></span></i><span\nstyle=\'font-size:10.0pt;font-family:Arial\'>&nbsp;(nr.&nbsp;9450)&nbsp;van&nbsp;<span class=SpellE>Bordier</span>\nAffinity&nbsp;Products.&nbsp;</span><span lang=NL-BE style=\'font-size:10.0pt;font-family:\nArial;mso-ansi-language:NL-BE\'>Zie&nbsp;bijsluiter&nbsp;in&nbsp;bijlage:&nbsp;CLKB_B_0306.&nbsp;Te\nbewaren&nbsp;bij&nbsp;2&nbsp;\xe2\x80\x93&nbsp;8&nbsp;\xc2\xb0C&nbsp;tot&nbsp;vervaldatum.</span><span lang=NL-BE style=\'mso-ansi-language:\nNL-BE\'><o:p></o:p></span></p>\n</body>\n</html>\n
Run Code Online (Sandbox Code Playgroud)\n