如何检测角色是否属于从右到左的语言?

Pat*_*lug 27 c# localization bidi right-to-left

什么是判断字符串是否包含"从右到左"语言的文本的好方法.

我发现这个问题表明了以下方法:

public bool IsArabic(string strCompare)
{
  char[] chars = strCompare.ToCharArray();
  foreach (char ch in chars)
    if (ch >= '\u0627' && ch <= '\u0649') return true;
  return false;
}
Run Code Online (Sandbox Code Playgroud)

虽然这可能适用于阿拉伯语,但这似乎不包括其他RTL语言,如希伯来语.有没有通用的方法来知道某个特定字符属于RTL语言?

dtb*_*dtb 23

Unicode字符具有与之关联的不同属性.这些属性不能从代码点派生; 你需要一个表来告诉你一个角色是否具有某种属性.

您对双向属性"R"或"AL"(RandALCat)的字符感兴趣.

RandALCat字符是具有明确的从右到左方向性的字符.

这是Unicode 3.2(来自RFC 3454)的完整列表:

D. Bidirectional tables

D.1 Characters with bidirectional property "R" or "AL"

----- Start Table D.1 -----
05BE
05C0
05C3
05D0-05EA
05F0-05F4
061B
061F
0621-063A
0640-064A
066D-066F
0671-06D5
06DD
06E5-06E6
06FA-06FE
0700-070D
0710
0712-072C
0780-07A5
07B1
200F
FB1D
FB1F-FB28
FB2A-FB36
FB38-FB3C
FB3E
FB40-FB41
FB43-FB44
FB46-FBB1
FBD3-FD3D
FD50-FD8F
FD92-FDC7
FDF0-FDFC
FE70-FE74
FE76-FEFC
----- End Table D.1 -----

以下是从Unicode 6.0获取完整列表的一些代码:

var url = "http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt";

var query = from record in new WebClient().DownloadString(url).Split('\n')
            where !string.IsNullOrEmpty(record)
            let properties = record.Split(';')
            where properties[4] == "R" || properties[4] == "AL"
            select int.Parse(properties[0], NumberStyles.AllowHexSpecifier);

foreach (var codepoint in query)
{
    Console.WriteLine(codepoint.ToString("X4"));
}
Run Code Online (Sandbox Code Playgroud)

请注意,这些值是Unicode代码点.C#/ .NET中的字符串是UTF-16编码的,需要先转换为Unicode代码点(参见Char.ConvertToUtf32).这是一个检查字符串是否包含至少一个RandALCat字符的方法:

static void IsAnyCharacterRightToLeft(string s)
{
    for (var i = 0; i < s.Length; i += char.IsSurrogatePair(s, i) ? 2 : 1)
    {
        var codepoint = char.ConvertToUtf32(s, i);
        if (IsRandALCat(codepoint))
        {
            return true;
        }
    }
    return false;
}
Run Code Online (Sandbox Code Playgroud)


Bre*_*ias 18

您可以尝试在正则表达式中使用" 命名块 " .只需选择从右到左的块,然后形成正则表达式.例如:

\p{IsArabic}|\p{IsHebrew}
Run Code Online (Sandbox Code Playgroud)

如果该正则表达式返回true,则字符串中至少有一个希伯来语或阿拉伯语字符.


fra*_*ike 8

Unicode 6.0的所有"AL"或"R"(来自http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt)

bool hasRandALCat = 0;
if(c >= 0x5BE && c <= 0x10B7F)
{
    if(c <= 0x85E)
    {
        if(c == 0x5BE)                        hasRandALCat = 1;
        else if(c == 0x5C0)                   hasRandALCat = 1;
        else if(c == 0x5C3)                   hasRandALCat = 1;
        else if(c == 0x5C6)                   hasRandALCat = 1;
        else if(0x5D0 <= c && c <= 0x5EA)     hasRandALCat = 1;
        else if(0x5F0 <= c && c <= 0x5F4)     hasRandALCat = 1;
        else if(c == 0x608)                   hasRandALCat = 1;
        else if(c == 0x60B)                   hasRandALCat = 1;
        else if(c == 0x60D)                   hasRandALCat = 1;
        else if(c == 0x61B)                   hasRandALCat = 1;
        else if(0x61E <= c && c <= 0x64A)     hasRandALCat = 1;
        else if(0x66D <= c && c <= 0x66F)     hasRandALCat = 1;
        else if(0x671 <= c && c <= 0x6D5)     hasRandALCat = 1;
        else if(0x6E5 <= c && c <= 0x6E6)     hasRandALCat = 1;
        else if(0x6EE <= c && c <= 0x6EF)     hasRandALCat = 1;
        else if(0x6FA <= c && c <= 0x70D)     hasRandALCat = 1;
        else if(c == 0x710)                   hasRandALCat = 1;
        else if(0x712 <= c && c <= 0x72F)     hasRandALCat = 1;
        else if(0x74D <= c && c <= 0x7A5)     hasRandALCat = 1;
        else if(c == 0x7B1)                   hasRandALCat = 1;
        else if(0x7C0 <= c && c <= 0x7EA)     hasRandALCat = 1;
        else if(0x7F4 <= c && c <= 0x7F5)     hasRandALCat = 1;
        else if(c == 0x7FA)                   hasRandALCat = 1;
        else if(0x800 <= c && c <= 0x815)     hasRandALCat = 1;
        else if(c == 0x81A)                   hasRandALCat = 1;
        else if(c == 0x824)                   hasRandALCat = 1;
        else if(c == 0x828)                   hasRandALCat = 1;
        else if(0x830 <= c && c <= 0x83E)     hasRandALCat = 1;
        else if(0x840 <= c && c <= 0x858)     hasRandALCat = 1;
        else if(c == 0x85E)                   hasRandALCat = 1;
    }
    else if(c == 0x200F)                      hasRandALCat = 1;
    else if(c >= 0xFB1D)
    {
        if(c == 0xFB1D)                       hasRandALCat = 1;
        else if(0xFB1F <= c && c <= 0xFB28)   hasRandALCat = 1;
        else if(0xFB2A <= c && c <= 0xFB36)   hasRandALCat = 1;
        else if(0xFB38 <= c && c <= 0xFB3C)   hasRandALCat = 1;
        else if(c == 0xFB3E)                  hasRandALCat = 1;
        else if(0xFB40 <= c && c <= 0xFB41)   hasRandALCat = 1;
        else if(0xFB43 <= c && c <= 0xFB44)   hasRandALCat = 1;
        else if(0xFB46 <= c && c <= 0xFBC1)   hasRandALCat = 1;
        else if(0xFBD3 <= c && c <= 0xFD3D)   hasRandALCat = 1;
        else if(0xFD50 <= c && c <= 0xFD8F)   hasRandALCat = 1;
        else if(0xFD92 <= c && c <= 0xFDC7)   hasRandALCat = 1;
        else if(0xFDF0 <= c && c <= 0xFDFC)   hasRandALCat = 1;
        else if(0xFE70 <= c && c <= 0xFE74)   hasRandALCat = 1;
        else if(0xFE76 <= c && c <= 0xFEFC)   hasRandALCat = 1;
        else if(0x10800 <= c && c <= 0x10805) hasRandALCat = 1;
        else if(c == 0x10808)                 hasRandALCat = 1;
        else if(0x1080A <= c && c <= 0x10835) hasRandALCat = 1;
        else if(0x10837 <= c && c <= 0x10838) hasRandALCat = 1;
        else if(c == 0x1083C)                 hasRandALCat = 1;
        else if(0x1083F <= c && c <= 0x10855) hasRandALCat = 1;
        else if(0x10857 <= c && c <= 0x1085F) hasRandALCat = 1;
        else if(0x10900 <= c && c <= 0x1091B) hasRandALCat = 1;
        else if(0x10920 <= c && c <= 0x10939) hasRandALCat = 1;
        else if(c == 0x1093F)                 hasRandALCat = 1;
        else if(c == 0x10A00)                 hasRandALCat = 1;
        else if(0x10A10 <= c && c <= 0x10A13) hasRandALCat = 1;
        else if(0x10A15 <= c && c <= 0x10A17) hasRandALCat = 1;
        else if(0x10A19 <= c && c <= 0x10A33) hasRandALCat = 1;
        else if(0x10A40 <= c && c <= 0x10A47) hasRandALCat = 1;
        else if(0x10A50 <= c && c <= 0x10A58) hasRandALCat = 1;
        else if(0x10A60 <= c && c <= 0x10A7F) hasRandALCat = 1;
        else if(0x10B00 <= c && c <= 0x10B35) hasRandALCat = 1;
        else if(0x10B40 <= c && c <= 0x10B55) hasRandALCat = 1;
        else if(0x10B58 <= c && c <= 0x10B72) hasRandALCat = 1;
        else if(0x10B78 <= c && c <= 0x10B7F) hasRandALCat = 1;
    }
}
Run Code Online (Sandbox Code Playgroud)