我有一些包含特殊字符的 URL。例如:
http://www.example.com/bléèàû.html
Run Code Online (Sandbox Code Playgroud)
如果您在浏览器中输入此 URL,我的 Web 服务器将显示正确的页面(它可以处理特殊字符)。
I have looked at the sitemaps specs and it's not clear whether or not sitemaps file can contain special character. From what I understand of the protocol, if the URL is working fine and the server serves the correct page and the XML file is UTF-8 encoded, then it's ok.
For example, this entry is a valid sitemaps entry:
<url>
<loc>http://www.example.com/bléèàû.html</loc>
<changefreq>weekly</changefreq>
</url>
Run Code Online (Sandbox Code Playgroud)
Anyone can confirm this?
[Update] The reason I'm reluctant to encode the special characters is that I don't want to introduce duplicate URLs for the same content. For example
http://www.example.com/bl%C3%A9%C3%A8%C3%A0%C3%BB.html
Run Code Online (Sandbox Code Playgroud)
and
http://www.example.com/bléèàû.html
Run Code Online (Sandbox Code Playgroud)
would serve the same page. I presume Google would catch both URL with its normal indexing and the sitemaps. Unfortunately Google have a tendency to downgrade page rank of sites that have duplicate URLs pointing to the same page.
站点地图规范没有说明。它显示了各种转义形式的 URL 示例,但没有明确说明第一个示例(原始字符)是否允许。它只将它们称为“URL”,而没有提及“URL”或 RFC 的特定定义,这将阐明它们是指老式 ASCII URI 还是 IRI(可能包含非 ASCII 字符)。
因此,对 URL 的 UTF-8 编码进行 %-escape 是最安全的。然后该链接将在全球范围内工作,并且应该在所有现代浏览器中作为 Unicode 字符呈现给用户。
<loc>http://www.example.com/bl%C3%A9%C3%A8%C3%A0%C3%BB.html</loc>
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3259 次 |
最近记录: |