小智 34
我已经使用了Text_LanguageDetect梨包,并得到了一些合理的结果.它使用起来很简单,它有一个适度的52语言数据库.缺点是没有检测到东亚语言.
require_once 'Text/LanguageDetect.php';
$l = new Text_LanguageDetect();
$result = $l->detect($text, 4);
if (PEAR::isError($result)) {
echo $result->getMessage();
} else {
print_r($result);
}
Run Code Online (Sandbox Code Playgroud)
结果是:
Array
(
[german] => 0.407037037037
[dutch] => 0.288065843621
[english] => 0.283333333333
[danish] => 0.234526748971
)
Run Code Online (Sandbox Code Playgroud)
Swi*_*ter 17
我知道这是一个老帖子,但这是我在找不到任何可行的解决方案之后开发的.
该解决方案使用一种语言中最常用的20个单词,计算大海捞针中出现的单词.然后它只比较第一和第二计数语言的计数.如果亚军数小于获胜者的10%,则获胜者将全部获胜.
代码 - 任何提高速度的建议都非常受欢迎!
function getTextLanguage($text, $default) {
$supported_languages = array(
'en',
'de',
);
// German word list
// from http://wortschatz.uni-leipzig.de/Papers/top100de.txt
$wordList['de'] = array ('der', 'die', 'und', 'in', 'den', 'von',
'zu', 'das', 'mit', 'sich', 'des', 'auf', 'für', 'ist', 'im',
'dem', 'nicht', 'ein', 'Die', 'eine');
// English word list
// from http://en.wikipedia.org/wiki/Most_common_words_in_English
$wordList['en'] = array ('the', 'be', 'to', 'of', 'and', 'a', 'in',
'that', 'have', 'I', 'it', 'for', 'not', 'on', 'with', 'he',
'as', 'you', 'do', 'at');
// clean out the input string - note we don't have any non-ASCII
// characters in the word lists... change this if it is not the
// case in your language wordlists!
$text = preg_replace("/[^A-Za-z]/", ' ', $text);
// count the occurrences of the most frequent words
foreach ($supported_languages as $language) {
$counter[$language]=0;
}
for ($i = 0; $i < 20; $i++) {
foreach ($supported_languages as $language) {
$counter[$language] = $counter[$language] +
// I believe this is way faster than fancy RegEx solutions
substr_count($text, ' ' .$wordList[$language][$i] . ' ');;
}
}
// get max counter value
// from http://stackoverflow.com/a/1461363
$max = max($counter);
$maxs = array_keys($counter, $max);
// if there are two winners - fall back to default!
if (count($maxs) == 1) {
$winner = $maxs[0];
$second = 0;
// get runner-up (second place)
foreach ($supported_languages as $language) {
if ($language <> $winner) {
if ($counter[$language]>$second) {
$second = $counter[$language];
}
}
}
// apply arbitrary threshold of 10%
if (($second / $max) < 0.1) {
return $winner;
}
}
return $default;
}
Run Code Online (Sandbox Code Playgroud)
Est*_*ber 15
您可以使用Google的AJAX语言API(现已不存在)完全使用客户端.
使用AJAX语言API,您可以仅使用Javascript在网页中翻译和检测文本块的语言.此外,您可以在网页中的任何文本字段或文本区域上启用音译.例如,如果您音译为印地语,此API将允许用户使用英语在语音上拼写出印地语单词并将其显示在印地语脚本中.
您可以自动检测字符串的语言
var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
if (!result.error) {
var language = 'unknown';
for (l in google.language.Languages) {
if (google.language.Languages[l] == result.language) {
language = l;
break;
}
}
var container = document.getElementById("detection");
container.innerHTML = text + " is: " + language + "";
}
});
Run Code Online (Sandbox Code Playgroud)
翻译用其中一种支持的语言编写的任何字符串(也已不存在)
google.language.translate("Hello world", "en", "es", function(result) {
if (!result.error) {
var container = document.getElementById("translation");
container.innerHTML = result.translation;
}
});
Run Code Online (Sandbox Code Playgroud)
由于Google Translate API将作为免费服务关闭,您可以尝试这种免费替代方案,它可替代Google Translate API:
归档时间: |
|
查看次数: |
52314 次 |
最近记录: |