我应该在站点地图中编码特殊字符吗?

Thi*_*Roy 3 sitemap

我有一些包含特殊字符的 URL。例如:

http://www.example.com/bléèàû.html
Run Code Online (Sandbox Code Playgroud)

如果您在浏览器中输入此 URL,我的 Web 服务器将显示正确的页面(它可以处理特殊字符)。

I have looked at the sitemaps specs and it's not clear whether or not sitemaps file can contain special character. From what I understand of the protocol, if the URL is working fine and the server serves the correct page and the XML file is UTF-8 encoded, then it's ok.

For example, this entry is a valid sitemaps entry:

   <url>
      <loc>http://www.example.com/bléèàû.html</loc>
      <changefreq>weekly</changefreq>
   </url>
Run Code Online (Sandbox Code Playgroud)

Anyone can confirm this?

[Update] The reason I'm reluctant to encode the special characters is that I don't want to introduce duplicate URLs for the same content. For example

http://www.example.com/bl%C3%A9%C3%A8%C3%A0%C3%BB.html
Run Code Online (Sandbox Code Playgroud)

and

http://www.example.com/bléèàû.html
Run Code Online (Sandbox Code Playgroud)

would serve the same page. I presume Google would catch both URL with its normal indexing and the sitemaps. Unfortunately Google have a tendency to downgrade page rank of sites that have duplicate URLs pointing to the same page.

bob*_*nce 6

站点地图规范没有说明。它显示了各种转义形式的 URL 示例,但没有明确说明第一个示例(原始字符)是否允许。它只将它们称为“URL”,而没有提及“URL”或 RFC 的特定定义,这将阐明它们是指老式 ASCII URI 还是 IRI(可能包含非 ASCII 字符)。

因此,对 URL 的 UTF-8 编码进行 %-escape 是最安全的。然后该链接将在全球范围内工作,并且应该在所有现代浏览器中作为 Unicode 字符呈现给用户。

<loc>http://www.example.com/bl%C3%A9%C3%A8%C3%A0%C3%BB.html</loc>
Run Code Online (Sandbox Code Playgroud)