什么是<html>标签的lang属性用于?

San*_*mar 32 html

在HTML中,有一个lang属性lang,例如<html>.

这有用吗?

如果用于翻译,即使语言设置为英语并且文档中包含所有中文文本,Google翻译也会将其检测为中文,而不是英文(这意味着Google会忽略该<html lang="en">属性).

Nul*_*teя 30

我是从W3C引用的

在HTML中声明语言

始终在html标记上使用language属性来声明页面中文本的默认语言.当页面包含其他语言的内容时,请将语言属性添加到该内容周围的元素.

对于用作HTML的页面使用lang属性,对用作XML的页面使用xml:lang属性.对于XHTML 1.x和HTML5多语言文档,请同时使用它们.

使用IANA语言子标签注册表中的语言标签.

还好读为什么要使用语言属性?

  • Bleh ...引用的内容,这只是一个无法解释的使用该属性的指令,甚至没有远程回答问题(这是为什么使用该属性及其作用).这实际上是一个仅限链接的答案; 与问题实际相关的唯一内容是"为何使用语言属性?" 链接.我可能会稍后编辑它以将实际相关的内容转换为答案(或者自己随意这样做). (13认同)

Flo*_*ris 10

你问"这有用".

"该<lang=>属性可用于声明网页的语言或网页的一部分.这是为了帮助搜索引擎蜘蛛,页面格式和屏幕阅读器技术"

来源:http://symbolcodes.tlt.psu.edu/web/tips/langtag.html

没有提到翻译 - 但通常搜索引擎蜘蛛不会想要用"错误的语言"解析文档 - 它的索引文件会增长(许多新词),结果对用户没用无法阅读语言,谁使用错误的搜索条款).

智能翻译技术(如谷歌,如上所述)的出现意味着一些搜索引擎可以看到一种语言的页面,翻译它,并发现搜索"牛"的人可能会对这个提到的页面感兴趣" "并且有<lang="fr">.


Mar*_*ery 8

The lang attribute is needed by screen readers to let them pronounce words correctly, and also (perhaps surprisingly) sometimes needed to allow text to be rendered correctly by the browser.

\n

lang needed for speech synthesis

\n

Some blind or visually impaired people use speech-synthesizing screen readers that speak the words on the screen. Since two words from different languages that are spelt identically may be pronounced differently, such speech synthesis cannot be done without knowing the language of the text. For instance, the word "pain" in English is pronounced completely differently to the word "pain" in French, so a screen reader that doesn\'t know whether it\'s reading English or French won\'t know how to pronounce "pain".

\n

Using the lang attribute indicates to a screen reader what language some text is in and thus allows it to pronounce the word correctly.

\n

I recorded a demonstration of this using Narrator, the built-in screen reader for Windows. (If you\'d like to reproduce this, do note that you\'ll need to have both the English and French voice packages installed via the Speech settings page in the Windows Settings app, and have English as your default voice.) The demo uses a HTML page with the following content:

\n
<h5>No lang specified:</h5>\n<p>J\'aime le pain</p>\n\n<h5>French:</h5>\n<p lang="fr">J\'aime le pain</p>\n
Run Code Online (Sandbox Code Playgroud)\n

As you can hear in the recording I uploaded at https://www.youtube.com/watch?v=7J1I65sn1CQ, Microsoft George (the default English voice) butchers the pronunciation of the French phrase (pronouncing it "Jay aim le payne"), whereas Microsoft Hortense (the default French voice) pronounces it correctly.

\n

lang needed for text rendering

\n

Perhaps surprisingly, the benefits of the lang attribute are not limited to disabled people using speech-synthesizing assistive tech. Setting lang can also affect text rendering, since the correct way to render some text can be language-dependent.

\n

There are a couple of different mechanisms by which the lang you set can affect how text gets rendered:

\n
    \n
  • different fonts being selected based on the lang attribute, either:

    \n
      \n
    • based on the browser\'s default font selection rules, or
    • \n
    • because you\'ve explicitly set up language-specific fonts using :lang selectors in your CSS
    • \n
    \n

    or

    \n
  • \n
  • fonts having language-specific rules included in them, such as language-specific alternative glyphs or language-specific rules about which sequences of characters to substitute with a ligature

    \n
  • \n
\n

Below I will present a couple of interesting examples I could discover of such language-specific rendering happening.

\n

Language-dependent forms of Han characters

\n

There exist many Han (Chinese) characters that have been adopted in other east-Asian languages, such as Japanese (where such characters are called "Kanji"). The proper way to draw these characters sometimes differs between Chinese and the other languages that have assimilated them, yet, due to Unicode\'s Han unification, there only exists a single Unicode code point to represent the character, rather than a distinct code point for each language-specific variant of it. Several examples are listed in the Examples of language-dependent glyphs section of the Wikipedia article linked above.

\n

When rendering such a character, in order to know which glyph to display (for instance, whether to display the Japanese Kanji or the Chinese hanzi), the browser needs to know the language of the text in which the character appears.

\n

To try to see your browser considering text\'s language in this way, save the following HTML to a file and open it in your browser:

\n
Chinese: <span lang="zh">\xe9\xa3\xb4</span>\n<br>\nJapanese: <span lang="ja">\xe9\xa3\xb4</span>\n
Run Code Online (Sandbox Code Playgroud)\n

Note that the same character, \xe9\xa3\xb4, is used in both spans. But they display differently in the browser, at least in Chrome on my Windows PC:

\n

演示上述观点的屏幕截图

\n

As you can see, the Kanji rendered in the span marked as Japanese is different in several ways from the hanzi rendered in the span marked as Chinese. By inspecting each span in the Chrome dev tools and looking at the "Rendered Fonts" section, I can see that this is because Chrome has used different fonts for the two spans - namely Microsoft YaHei for the Chinese span and Yu Gothic for the Japanese one.

\n

fi ligatures getting disabled for Turkish text

\n

As described at https://en.wikipedia.org/wiki/Ligature_(writing)#Stylistic_ligatures, a stylistic ligature is used in many fonts that merges together the letters fi into a single combined glyph, where the top-right corner of the f merges with the dot above the i. In most languages, like English, this looks pretty and doesn\'t make the text any less readable.

\n

显示组合“fi”字形的图像

\n

然而,这样的连字在土耳其语或其他语言中是有问题的,其中点式和无点式 I都存在并且是不同的字符,因为它使得无法判断它是否表示fif后跟点式i)或f\xc4\xb1 (an f followed by a dotless \xc4\xb1).

\n

因此,包含替换的字体fi with such a ligature will hopefully have that substitution only occur in languages for which it\'s appropriate. As I understand it, in OpenType, such rules are implemented by making "features" in the font specific to particular "language systems" via the Language System Table.

\n

为了看到这个效果,我下载了一个带有这样的字体fi ligature - specifically Okta Neue - and created the following demo page:

\n
<style>\n    @font-face {\n        font-family: oktaneue;\n        src: url("Groteskly Yours - Okta Neue UltraLight.otf");\n    }\n    * {\n        font-family: oktaneue;\n    }\n</style>\n<span lang="en">L\xc3\xbctfiye</span>\n<br>\n<span lang="tr">L\xc3\xbctfiye</span>\n
Run Code Online (Sandbox Code Playgroud)\n

请注意,这一次 - 与前面的汉字和汉字示例不同 - 两个跨度使用相同的字体。但是因为the font itself contains language-specific features, the spans nonetheless render differently:

\n

上述示例页面的屏幕截图

\n

如您所见,fi ligature gets used for the span labelled as English, but not for the one labelled as Turkish - which is what we wanted!

\n