在阿拉伯语中,像"ا"(Alef)这样的字母有很多形式/变体:
(ا,أ,Å,آ)
它也与字母ي的情况相同,也可能是ى.
我想要做的是获得一个单词的所有可能的变化与许多أ和ي字母.
例如,"أين"这个词应该包含所有这些(大多数情况下都是不正确的)变体:أين,إين,اين,آين,أىن,إين,اىن,آىن......等等.
为什么?我正在构建一个小型文本更正系统,可以处理语法错误并用正确的单词替换错误的单词.
我一直试图以最干净的方式做到这一点,但我最终得到一个8 for/foreach循环只是为了处理"أ"这个词
必须有一个更好的更干净的方法来做到这一点!有什么想法吗?
这是我的代码到目前为止:
$alefVariations = ['?', '?', '?', '?'];
$word = '??????';
// Break into letters
$wordLetters = preg_split('//u', $word, null, PREG_SPLIT_NO_EMPTY);
$wordAlefLettersIndexes = [];
// Get the ? letters
for($letterIndex = 0; $letterIndex < count($wordLetters); $letterIndex++){
if(in_array($wordLetters[$letterIndex], $alefVariations)){
$wordAlefLettersIndexes[] = $letterIndex;
}
}
$eachLetterVariations = [];
foreach($wordAlefLettersIndexes as $alefLettersIndex){
foreach($alefVariations as $alefVariation){
$wordCopy = $wordLetters;
$wordCopy[$alefLettersIndex] = $alefVariation;
$eachLetterVariations[$alefLettersIndex][] = $wordCopy;
}
}
$variations = [];
foreach($wordAlefLettersIndexes as $alefLettersIndex){
$alefWordVariations = $eachLetterVariations[$alefLettersIndex];
foreach($wordAlefLettersIndexes as $alefLettersIndex_inner){
if($alefLettersIndex == $alefLettersIndex_inner) continue;
foreach($alefWordVariations as $alefWordVariation){
foreach($alefVariations as $alefVariation){
$alefWordVariationCopy = $alefWordVariation;
$alefWordVariationCopy[$alefLettersIndex_inner] = $alefVariation;
$variations[] = $alefWordVariationCopy;
}
}
}
}
$finalList = [];
foreach($variations as $variation){
$finalList[] = implode('', $variation);
}
return array_unique($finalList);
Run Code Online (Sandbox Code Playgroud)
我不认为这是进行自动更正的方法,但这是您提出的问题的通用解决方案。它使用递归,并且是用 javascript 编写的(我不知道 php)。
\n\nfunction solve(word, sameLetters, customIndices = []){\r\n var splitLetters = word.split(\'\')\r\n .map((char, index) => { // check if the current letter is within any variation\r\n if(customIndices.length == 0 || customIndices.includes(index)){\r\n var variations = sameLetters.find(arr => arr.includes(char));\r\n if(variations != undefined) return variations;\r\n }\r\n return [char];\r\n });\r\n\r\n // up to this point splitLetters will be like this\r\n // [["\xd8\xa7","\xd8\xa5","\xd8\xa3","\xd8\xa2"],["\xd9\x8a","\xd9\x89","\xd9\x8a"],["\xd8\xa7"],["\xd9\x85"],["\xd9\x86"],["\xd8\xa7"]]\r\n var res = [];\r\n recurse(splitLetters, 0, \'\', res); // this function will generate all the permuations\r\n return res;\r\n}\r\n\r\nfunction recurse(letters, index, cur, res){\r\n if(index == letters.length){\r\n res.push(cur);\r\n } else {\r\n for(var letter of letters[index]) {\r\n recurse(letters, index + 1, cur + letter, res );\r\n }\r\n }\r\n}\r\n\r\nvar sameLetters = [ // represents the variations that you want to enumerate\r\n [\'\xd8\xa7\', \'\xd8\xa5\', \'\xd8\xa3\', \'\xd8\xa2\'],\r\n [\'\xd9\x8a\', \'\xd9\x89\', \'\xd9\x8a\']\r\n];\r\n\r\nvar word = \'\xd8\xa3\xd9\x8a\xd8\xa7\xd9\x85\xd9\x86\xd8\xa7\'; \r\nvar customIndices = [0, 1]; // will make variations to the letters in these indices only. leave it empty for all indices\r\n\r\nvar ans = solve(word, sameLetters, customIndices);\r\nconsole.log(ans);Run Code Online (Sandbox Code Playgroud)\r\n| 归档时间: |
|
| 查看次数: |
321 次 |
| 最近记录: |