eg. if the Name is: John Deer
the Initials should be: JD
Run Code Online (Sandbox Code Playgroud)
我可以使用子字符串在Initials字段上执行此检查,但是想知道我是否可以为它编写正则表达式?编写正则表达式比使用字符串方法更好吗?
Mas*_*iti 21
这是我的解决方案.我的目标不是提供最简单的解决方案,而是提供一种可以采用各种(有时是奇怪的)名称格式的解决方案,并在首字母和姓氏初始(或在匿名用户的情况下)产生最佳猜测.
我也尝试用相对国际友好的方式编写它,使用unicode正则表达式,虽然我没有为多种外来名称(例如中文)生成首字母的经验,尽管它至少应该生成一些可用的东西用两个字来代表这个人.例如,用"행운의복숭아"这样的韩语命名,就会产生행복,正如你所预期的那样(尽管在韩国文化中这可能不是正确的方法).
/// <summary>
/// Given a person's first and last name, we'll make our best guess to extract up to two initials, hopefully
/// representing their first and last name, skipping any middle initials, Jr/Sr/III suffixes, etc. The letters
/// will be returned together in ALL CAPS, e.g. "TW".
///
/// The way it parses names for many common styles:
///
/// Mason Zhwiti -> MZ
/// mason lowercase zhwiti -> MZ
/// Mason G Zhwiti -> MZ
/// Mason G. Zhwiti -> MZ
/// John Queue Public -> JP
/// John Q. Public, Jr. -> JP
/// John Q Public Jr. -> JP
/// Thurston Howell III -> TH
/// Thurston Howell, III -> TH
/// Malcolm X -> MX
/// A Ron -> AR
/// A A Ron -> AR
/// Madonna -> M
/// Chris O'Donnell -> CO
/// Malcolm McDowell -> MM
/// Robert "Rocky" Balboa, Sr. -> RB
/// 1Bobby 2Tables -> BT
/// Éric Ígor -> ÉÍ
/// ??? ??? -> ??
///
/// </summary>
/// <param name="name">The full name of a person.</param>
/// <returns>One to two uppercase initials, without punctuation.</returns>
public static string ExtractInitialsFromName(string name)
{
// first remove all: punctuation, separator chars, control chars, and numbers (unicode style regexes)
string initials = Regex.Replace(name, @"[\p{P}\p{S}\p{C}\p{N}]+", "");
// Replacing all possible whitespace/separator characters (unicode style), with a single, regular ascii space.
initials = Regex.Replace(initials, @"\p{Z}+", " ");
// Remove all Sr, Jr, I, II, III, IV, V, VI, VII, VIII, IX at the end of names
initials = Regex.Replace(initials.Trim(), @"\s+(?:[JS]R|I{1,3}|I[VX]|VI{0,3})$", "", RegexOptions.IgnoreCase);
// Extract up to 2 initials from the remaining cleaned name.
initials = Regex.Replace(initials, @"^(\p{L})[^\s]*(?:\s+(?:\p{L}+\s+(?=\p{L}))?(?:(\p{L})\p{L}*)?)?$", "$1$2").Trim();
if (initials.Length > 2)
{
// Worst case scenario, everything failed, just grab the first two letters of what we have left.
initials = initials.Substring(0, 2);
}
return initials.ToUpperInvariant();
}
Run Code Online (Sandbox Code Playgroud)
Nev*_*vyn 18
就个人而言,我更喜欢这个正则表达式
Regex initials = new Regex(@"(\b[a-zA-Z])[a-zA-Z]* ?");
string init = initials.Replace(nameString, "$1");
//Init = "JD"
Run Code Online (Sandbox Code Playgroud)
这会处理首字母和空白删除(即那里的'?').
你唯一需要担心的是像Jr.或Sr.或者Mrs ....等等的标题和标点.有些人确实包括那些全名
这是我的方法:
public static string GetInitials(string names) {
// Extract the first character out of each block of non-whitespace
// exept name suffixes, e.g. Jr., III. The number of initials is not limited.
return Regex.Replace(names, @"(?i)(?:^|\s|-)+([^\s-])[^\s-]*(?:(?:\s+)(?:the\s+)?(?:jr|sr|II|2nd|III|3rd|IV|4th)\.?$)?", "$1").ToUpper();
}
Run Code Online (Sandbox Code Playgroud)
处理过的案例:
// Mason Zhwiti -> MZ
// mason zhwiti -> MZ
// Mason G Zhwiti -> MGZ
// Mason G. Zhwiti -> MGZ
// John Queue Public -> JQP
// John-Queue Public -> JQP
// John Q. Public, Jr. -> JQP
// John Q Public Jr. -> JQP
// John Q Public Jr -> JQP
// John Q Public Jraroslav -> JQPJ
// Thurston Howell III -> TH
// Thurston Howell, III -> TH
// Thurston Howell the III -> TH
// Malcolm X -> MX
// A Ron -> AR
// A A Ron -> AAR
// Madonna -> M
// Chris O'Donnell -> CO
// Chris O' Donnell -> COD
// Malcolm McDowell -> MM
// Éric Ígor -> ÉÍ
// ??? ??? -> ??
Run Code Online (Sandbox Code Playgroud)
未处理案例:
// James Henry George Michael III the second -> JHGMIts
// Robert "Rocky" Balboa, Sr. -> R"B
// 1Bobby 2Tables -> 12 (is it a real name?)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
14675 次 |
最近记录: |