如何将西里尔文音译为拉丁文

ckk*_*ght 20 .net c# transliteration

我有一种方法可以将任何拉丁文本(例如英语,法语,德语,波兰语)变成其slug形式,

例如Alpha Bravo Charlie=>alpha-bravo-charlie

但它不能用于西里尔文本(例如俄语),所以我想要做的是将西里尔文本音译为拉丁文字,然后将其贬低.

有没有人有办法做这样的音译?无论是实际来源还是图书馆.

我在C#中编码,因此.NET库可以工作.或者,如果你有非C#代码,我相信我可以转换它.

Dim*_*sov 20

您可以使用.NET开源dll库UnidecodeSharpFork将西里尔语和更多语言音译为拉丁语.

用法示例:

Assert.AreEqual("Rabota s kirillitsey", "?????? ? ??????????".Unidecode());
Assert.AreEqual("CZSczs", "?ŽŠ?žš".Unidecode());
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());
Run Code Online (Sandbox Code Playgroud)

测试西里尔语:

/// <summary>
/// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN.
/// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian
/// With converting "?" to "yo".
/// </summary>
[TestMethod]
public void RussianAlphabetTest()
{
    string russianAlphabetLowercase = "? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?";
    string russianAlphabetUppercase = "? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?";

    string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";
    string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";

    Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());
    Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode());
}
Run Code Online (Sandbox Code Playgroud)

简单,快速,强大.如果您愿意,可以轻松扩展/修改音译表.

  • 错误.这将Анастасия音译为Anastasiya,而不是Anastasia.这看起来很糟糕.似乎这个文件(http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian)在特殊规定中是错误的.此外,您不会考虑特殊规定,UnidecodeSharpFork将德语变形金刚(äöüÄÖÜ)音译为aouAOU而不是Ae Oe Ue.这就是我从Upvote变为downvote的原因.如果您执行罗马化库(或算法),请正确执行,或以其他方式声明您的算法不完整/错误且尚未准备好进行生产. (3认同)
  • "如果你做一个罗马化图书馆"我没有.它只是简单的"音译到英文/拉丁文".而且它是完美的FAR,但它适用于许多语言.例如http://dotabro.com/player/76561198060110736/madgaming-crio-j-jinmawang我用它来获取链接"总比没有好". (3认同)

Rom*_*kar 18

    public static string Translit(string str)
    {
        string[] lat_up = {"A", "B", "V", "G", "D", "E", "Yo", "Zh", "Z", "I", "Y", "K", "L", "M", "N", "O", "P", "R", "S", "T", "U", "F", "Kh", "Ts", "Ch", "Sh", "Shch", "\"", "Y", "'", "E", "Yu", "Ya"};
        string[] lat_low = {"a", "b", "v", "g", "d", "e", "yo", "zh", "z", "i", "y", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "kh", "ts", "ch", "sh", "shch", "\"", "y", "'", "e", "yu", "ya"};
        string[] rus_up = {"?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"};
        string[] rus_low = { "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"};
        for (int i = 0; i <= 32; i++)
        {
            str = str.Replace(rus_up[i],lat_up[i]);
            str = str.Replace(rus_low[i],lat_low[i]);              
        }
        return str;
    }
Run Code Online (Sandbox Code Playgroud)

  • 让我们创建66*(字符数)字符串......很好. (7认同)

Max*_*kin 8

你为什么不能只使用音译表并制作一个小的正则表达式或子程序?


Sch*_*apz 8

优化了Sarvar Nishonboev的答案,似乎是一个最简单的解决方案,没有与每次迭代重新创建字符串相关的不必要的复杂性:

public static class Converter
{
    private static readonly Dictionary<char, string> ConvertedLetters = new Dictionary<char, string>
    {
        {'?', "a"},
        {'?', "b"},
        {'?', "v"},
        {'?', "g"},
        {'?', "d"},
        {'?', "e"},
        {'?', "yo"},
        {'?', "zh"},
        {'?', "z"},
        {'?', "i"},
        {'?', "j"},
        {'?', "k"},
        {'?', "l"},
        {'?', "m"},
        {'?', "n"},
        {'?', "o"},
        {'?', "p"},
        {'?', "r"},
        {'?', "s"},
        {'?', "t"},
        {'?', "u"},
        {'?', "f"},
        {'?', "h"},
        {'?', "c"},
        {'?', "ch"},
        {'?', "sh"},
        {'?', "sch"},
        {'?', "j"},
        {'?', "i"},
        {'?', "j"},
        {'?', "e"},
        {'?', "yu"},
        {'?', "ya"},
        {'?', "A"},
        {'?', "B"},
        {'?', "V"},
        {'?', "G"},
        {'?', "D"},
        {'?', "E"},
        {'?', "Yo"},
        {'?', "Zh"},
        {'?', "Z"},
        {'?', "I"},
        {'?', "J"},
        {'?', "K"},
        {'?', "L"},
        {'?', "M"},
        {'?', "N"},
        {'?', "O"},
        {'?', "P"},
        {'?', "R"},
        {'?', "S"},
        {'?', "T"},
        {'?', "U"},
        {'?', "F"},
        {'?', "H"},
        {'?', "C"},
        {'?', "Ch"},
        {'?', "Sh"},
        {'?', "Sch"},
        {'?', "J"},
        {'?', "I"},
        {'?', "J"},
        {'?', "E"},
        {'?', "Yu"},
        {'?', "Ya"}
    };

    public static string ConvertToLatin(string source)
    {
        var result = new StringBuilder();
        foreach (var letter in source)
        {
            result.Append(ConvertedLetters[letter]);
        }
        return result.ToString();
    }
}
Run Code Online (Sandbox Code Playgroud)

像这样使用它:

Converter.ConvertToLatin("??????????? ?????");
Run Code Online (Sandbox Code Playgroud)

  • 请注意,如果源字符串中存在某些拉丁字符,此代码将引发异常... (2认同)

Sar*_*oev 6

检查此代码:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace Transliter
{
    public partial class Form1 : Form
    {
        Dictionary<string, string> words = new Dictionary<string, string>();

        public Form1()
        {
            InitializeComponent();
            words.Add("?", "a");
            words.Add("?", "b");
            words.Add("?", "v");
            words.Add("?", "g");
            words.Add("?", "d");
            words.Add("?", "e");
            words.Add("?", "yo");
            words.Add("?", "zh");
            words.Add("?", "z");
            words.Add("?", "i");
            words.Add("?", "j");
            words.Add("?", "k");
            words.Add("?", "l");
            words.Add("?", "m");
            words.Add("?", "n");
            words.Add("?", "o");
            words.Add("?", "p");
            words.Add("?", "r");
            words.Add("?", "s");
            words.Add("?", "t");
            words.Add("?", "u");
            words.Add("?", "f");
            words.Add("?", "h");
            words.Add("?", "c");
            words.Add("?", "ch");
            words.Add("?", "sh");
            words.Add("?", "sch");
            words.Add("?", "j");
            words.Add("?", "i");
            words.Add("?", "j");
            words.Add("?", "e");
            words.Add("?", "yu");
            words.Add("?", "ya");
            words.Add("?", "A");
            words.Add("?", "B");
            words.Add("?", "V");
            words.Add("?", "G");
            words.Add("?", "D");
            words.Add("?", "E");
            words.Add("?", "Yo");
            words.Add("?", "Zh");
            words.Add("?", "Z");
            words.Add("?", "I");
            words.Add("?", "J");
            words.Add("?", "K");
            words.Add("?", "L");
            words.Add("?", "M");
            words.Add("?", "N");
            words.Add("?", "O");
            words.Add("?", "P");
            words.Add("?", "R");
            words.Add("?", "S");
            words.Add("?", "T");
            words.Add("?", "U");
            words.Add("?", "F");
            words.Add("?", "H");
            words.Add("?", "C");
            words.Add("?", "Ch");
            words.Add("?", "Sh");
            words.Add("?", "Sch");
            words.Add("?", "J");
            words.Add("?", "I");
            words.Add("?", "J");
            words.Add("?", "E");
            words.Add("?", "Yu");
            words.Add("?", "Ya");
    }

        private void button1_Click(object sender, EventArgs e)
        {
            string source = textBox1.Text;
            foreach (KeyValuePair<string, string> pair in words)
            {
                source = source.Replace(pair.Key, pair.Value);
            }
            textBox2.Text = source;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

Cryllic 到拉丁语:

text.Replace(pair.Key, pair.Value); 
Run Code Online (Sandbox Code Playgroud)

拉丁语到克里尔克语

source.Replace(pair.Value,pair.Key);
Run Code Online (Sandbox Code Playgroud)

  • 让我们创建 66*(字符数)字符串......很好。 (2认同)

jba*_*all 5

Microsoft 有一个音译工具,其中包含一个可以连接的 DLL(如果您打算非个人使用它,则需要检查许可限制)。您可以在Dejan Vesi\xc4\x87\ 的博客文章中阅读更多相关信息

\n


Nic*_*hro 5

您可以使用我的库进行音译:https : //github.com/nick-buhro/Translit
NuGet上也可用。

例:

var latin = Transliteration.CyrillicToLatin(
    "???????? ?????? ???????? ????????!", 
    Language.Russian);

Console.WriteLine(latin);   
// Output: Predkami dannaya mudrost` narodnaya!
Run Code Online (Sandbox Code Playgroud)