Node.js Cheerio解析器中断了UTF-8编码

Mee*_*ack 13 encoding node.js cheerio

我像Cheerio一样解析我的请求:

var url = http://shop.nag.ru/catalog/16939.IP-videonablyudenie-OMNY/16944.IP-kamery-OMNY-c-vario-obektivom/16704.OMNY-1000-PRO;
request.get(url, function (err, response, body) {
  console.log(body);
   $ = cheerio.load(body);
   console.log($(".description").html());
});
Run Code Online (Sandbox Code Playgroud)

作为输出,我看到内容,但在不可读的奇怪编码:

//Plain body console.log(body) (p.s. russian chars): 
<h1><span style="font-size: 16px;">??????? 3?? IP HD ?????? OMNY - ?????????? ????? ?????</span></h1><p style

//  cheerio's console.log $(".description").html()
<h1><span style="font-size: 16px;">&#x423;&#x43B;&#x438;&#x447;&#x43D;&#x430;&#x44F; 3&#x41C;&#x43F; IP HD &#x43A;&#x430;&#x43C;&#x435;&#x440;&#x430; OMNY
Run Code Online (Sandbox Code Playgroud)

目标URL链接编码采用UTF-8格式.那么为什么Cheerio会破坏我的编码呢?

试图使用iconv来编码我的身体反应:

var body1 = iconv.decode(body, "utf-8");
Run Code Online (Sandbox Code Playgroud)

console.log($(".description").html());仍然会返回奇怪的文字.

Jor*_*ing 33

Cheerio没有破坏任何东西.它输出的HTML将由任何与HTML输入完全相同的浏览器呈现.看一下这个片段:

<h1><span style="font-size: 16px;">??????? 3?? IP HD ?????? OMNY - ?????????? ????? ?????</span></h1>

<h1><span style="font-size: 16px;">&#x423;&#x43B;&#x438;&#x447;&#x43D;&#x430;&#x44F; 3&#x41C;&#x43F; IP HD &#x43A;&#x430;&#x43C;&#x435;&#x440;&#x430; OMNY - &#x43F;&#x43E;&#x43F;&#x440;&#x43E;&#x431;&#x443;&#x439;&#x442;&#x435; &#x43D;&#x430;&#x439;&#x442;&#x438; &#x43B;&#x443;&#x447;&#x448;&#x435;</span></h1>
Run Code Online (Sandbox Code Playgroud)

仅仅&#x423;是UTF-8字符的HTML"实体" 的情况,?与实体&gt;表示的方式相同>.

但是,如果要获取未编码的文本,可以将decodeEntities选项设置为false:

const $ = cheerio.load(
  `<h1><span style="font-size: 16px;">??????? 3?? IP HD ?????? OMNY - ?????????? ????? ?????</span></h1>`,
  { decodeEntities: false }
);


console.log($('span').html())
// => ??????? 3?? IP HD ?????? OMNY - ?????????? ????? ?????
Run Code Online (Sandbox Code Playgroud)
.as-console-wrapper{min-height:100%}
Run Code Online (Sandbox Code Playgroud)
<script src="https://bundle.run/cheerio@1.0.0-rc.3"></script>
Run Code Online (Sandbox Code Playgroud)

  • Thx,{decodeEntities:false}工作正常! (2认同)