Kos*_*ukh 12 .net c# string unicode xamarin.ios
我想得到一个给定长度的子字符串150.但是,我想确保我不切断unicode字符之间的字符串.
例如,请参阅以下代码:
var str = "Hello world!";
var substr = str.Substring(0, 6);
Run Code Online (Sandbox Code Playgroud)
这substr是一个无效的字符串,因为笑脸字符被切成两半.
相反,我想要一个如下功能:
var str = "Hello world!";
var substr = str.UnicodeSafeSubstring(0, 6);
Run Code Online (Sandbox Code Playgroud)
其中substr包含"你好"
作为参考,以下是我在Objective-C中使用的方法 rangeOfComposedCharacterSequencesForRange
NSString* str = @"Hello world!";
NSRange range = [message rangeOfComposedCharacterSequencesForRange:NSMakeRange(0, 6)];
NSString* substr = [message substringWithRange:range]];
Run Code Online (Sandbox Code Playgroud)
C#中的等效代码是什么?
看起来你正在寻找在字形上拆分字符串,即在单个显示的字符上.
在这种情况下,您有一个方便的方法StringInfo.SubstringByTextElements:
var str = "Hello world!";
var substr = new StringInfo(str).SubstringByTextElements(0, 6);
Run Code Online (Sandbox Code Playgroud)
这应返回从索引开始的最大子字符串startIndex,长度最多为length"完整"字素...因此,初始/最终"分裂"代理项对将被删除,初始组合标记将被删除,最终字符将缺少其组合标记将是除去.
请注意,可能它不是你问的...你似乎想用字形作为度量单位(或者你想要包括最后一个字母,即使它的长度超过length参数)
public static class StringEx
{
public static string UnicodeSafeSubstring(this string str, int startIndex, int length)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
if (startIndex < 0 || startIndex > str.Length)
{
throw new ArgumentOutOfRangeException("startIndex");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException("length");
}
if (startIndex + length > str.Length)
{
throw new ArgumentOutOfRangeException("length");
}
if (length == 0)
{
return string.Empty;
}
var sb = new StringBuilder(length);
int end = startIndex + length;
var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex);
while (enumerator.MoveNext())
{
string grapheme = enumerator.GetTextElement();
startIndex += grapheme.Length;
if (startIndex > length)
{
break;
}
// Skip initial Low Surrogates/Combining Marks
if (sb.Length == 0)
{
if (char.IsLowSurrogate(grapheme[0]))
{
continue;
}
UnicodeCategory cat = char.GetUnicodeCategory(grapheme, 0);
if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark)
{
continue;
}
}
sb.Append(grapheme);
if (startIndex == length)
{
break;
}
}
return sb.ToString();
}
}
Run Code Online (Sandbox Code Playgroud)
Variant将简单地在子字符串的末尾包含"额外"字符,如果有必要使整个字形:
public static class StringEx
{
public static string UnicodeSafeSubstring(this string str, int startIndex, int length)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
if (startIndex < 0 || startIndex > str.Length)
{
throw new ArgumentOutOfRangeException("startIndex");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException("length");
}
if (startIndex + length > str.Length)
{
throw new ArgumentOutOfRangeException("length");
}
if (length == 0)
{
return string.Empty;
}
var sb = new StringBuilder(length);
int end = startIndex + length;
var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex);
while (enumerator.MoveNext())
{
if (startIndex >= length)
{
break;
}
string grapheme = enumerator.GetTextElement();
startIndex += grapheme.Length;
// Skip initial Low Surrogates/Combining Marks
if (sb.Length == 0)
{
if (char.IsLowSurrogate(grapheme[0]))
{
continue;
}
UnicodeCategory cat = char.GetUnicodeCategory(grapheme, 0);
if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark)
{
continue;
}
}
sb.Append(grapheme);
}
return sb.ToString();
}
}
Run Code Online (Sandbox Code Playgroud)
这将返回你的要求"Hello world!".UnicodeSafeSubstring(0, 6) == "Hello".
| 归档时间: |
|
| 查看次数: |
1366 次 |
| 最近记录: |