给定字符串"ThisStringHasNoSpacesButItDoesHaveCapitals",在大写字母之前添加空格的最佳方法是什么.所以结束字符串将是"这个字符串没有空格,但它有资本"
以下是我使用RegEx的尝试
System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
Run Code Online (Sandbox Code Playgroud)
Bin*_*ier 194
正则表达式将工作正常(我甚至投票马丁布朗的答案),但他们是昂贵的(并且我个人发现任何模式比几个字符过于迟钝)
这个功能
string AddSpacesToSentence(string text, bool preserveAcronyms)
{
if (string.IsNullOrWhiteSpace(text))
return string.Empty;
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]))
if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
(preserveAcronyms && char.IsUpper(text[i - 1]) &&
i < text.Length - 1 && !char.IsUpper(text[i + 1])))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Run Code Online (Sandbox Code Playgroud)
将在2,968,750个刻度中执行100,000次,正则表达式将需要25,000,000个刻度(并且正则数据库编译时也是如此).
对于给定的更好(即更快)的值,它会更好,但是需要维护更多的代码."更好"通常会影响竞争要求.
希望这可以帮助 :)
更新
很长一段时间以来,我看了这个,我刚刚意识到自代码更改后时间没有更新(它只改变了一点).
在'Abbbbbbbb'重复100次(即1,000字节)的字符串上,100,000次转换的运行采用手动编码功能4,517,177,并且下面的正则表达式采用59,435,719,使得手动编码功能在7.6%的时间内运行正则表达式.
更新2是否 会考虑缩略语?它现在会!if语句的逻辑是相当模糊的,因为你可以看到它扩展到这个......
if (char.IsUpper(text[i]))
if (char.IsUpper(text[i - 1]))
if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
newText.Append(' ');
else ;
else if (text[i - 1] != ' ')
newText.Append(' ');
Run Code Online (Sandbox Code Playgroud)
......根本没有帮助!
这是最初的简单方法,不用担心缩略语
string AddSpacesToSentence(string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]) && text[i - 1] != ' ')
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Run Code Online (Sandbox Code Playgroud)
Mar*_*own 141
您的解决方案存在一个问题,即它会在第一个字母T之前放置一个空格,以便您获得
" This String..." instead of "This String..."
Run Code Online (Sandbox Code Playgroud)
为了解决此问题,请查看其前面的小写字母,然后在中间插入空格:
newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");
Run Code Online (Sandbox Code Playgroud)
编辑1:
如果你使用@"(\p{Ll})(\p{Lu})"
它也会拾取重音字符.
编辑2:
如果你的字符串可以包含首字母缩略词,你可能想要使用它:
newValue = Regex.Replace(value, @"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");
Run Code Online (Sandbox Code Playgroud)
因此"DriveIsSCSICompatible"变为"驱动器与SCSI兼容"
Eti*_*neT 78
没有测试性能,但在这里与linq一起:
var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');
Run Code Online (Sandbox Code Playgroud)
Rob*_*rdy 13
我知道这是一个旧的,但这是我在需要时使用的扩展:
public static class Extensions
{
public static string ToSentence( this string Input )
{
return new string(Input.SelectMany((c, i) => i > 0 && char.IsUpper(c) ? new[] { ' ', c } : new[] { c }).ToArray());
}
}
Run Code Online (Sandbox Code Playgroud)
这将允许您使用 MyCasedString.ToSentence()
对于现代文本来说,所有这些解决方案都是错 你需要使用能理解案例的东西.由于Bob要求使用其他语言,我会为Perl提供一对.
我提供四种解决方案,从最差到最好.只有最好的一个总是正确的.其他人有问题.这是一个测试运行,向您展示哪些有效,哪些无效,以及在哪里.我已经使用了下划线,这样你就可以看到空间被放置的位置,并且我已经将错误标记为错误.
Testing TheLoneRanger
Worst: The_Lone_Ranger
Ok: The_Lone_Ranger
Better: The_Lone_Ranger
Best: The_Lone_Ranger
Testing MountM?KinleyNationalPark
[WRONG] Worst: Mount_M?Kinley_National_Park
[WRONG] Ok: Mount_M?Kinley_National_Park
[WRONG] Better: Mount_M?Kinley_National_Park
Best: Mount_M?_Kinley_National_Park
Testing ElÁlamoTejano
[WRONG] Worst: ElÁlamo_Tejano
Ok: El_Álamo_Tejano
Better: El_Álamo_Tejano
Best: El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
[WRONG] Worst: TheÆvar_ArnfjörðBjarmason
Ok: The_Ævar_Arnfjörð_Bjarmason
Better: The_Ævar_Arnfjörð_Bjarmason
Best: The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
[WRONG] Worst: Il_CaffèMacchiato
Ok: Il_Caffè_Macchiato
Better: Il_Caffè_Macchiato
Best: Il_Caffè_Macchiato
Testing Mister?enan?ubovi?
[WRONG] Worst: Mister?enan?ubovi?
[WRONG] Ok: Mister?enan?ubovi?
Better: Mister_?enan_?ubovi?
Best: Mister_?enan_?ubovi?
Testing OleKingHenry?
[WRONG] Worst: Ole_King_Henry?
[WRONG] Ok: Ole_King_Henry?
[WRONG] Better: Ole_King_Henry?
Best: Ole_King_Henry_?
Testing Carlos?ºElEmperador
[WRONG] Worst: Carlos?ºEl_Emperador
[WRONG] Ok: Carlos?º_El_Emperador
[WRONG] Better: Carlos?º_El_Emperador
Best: Carlos_?º_El_Emperador
Run Code Online (Sandbox Code Playgroud)
顺便说一下,这里几乎所有人都选择了第一种方式,即标记为"最差"的方式.一些人选择了第二种方式,标记为"OK".但在我之前没有其他人向您展示如何做"更好"或"最佳"方法.
以下是测试程序及其四种方法:
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
# First I'll prove these are fine variable names:
my (
$TheLoneRanger ,
$MountM?KinleyNationalPark ,
$ElÁlamoTejano ,
$TheÆvarArnfjörðBjarmason ,
$IlCaffèMacchiato ,
$Mister?enan?ubovi? ,
$OleKingHenry? ,
$Carlos?ºElEmperador ,
);
# Now I'll load up some string with those values in them:
my @strings = qw{
TheLoneRanger
MountM?KinleyNationalPark
ElÁlamoTejano
TheÆvarArnfjörðBjarmason
IlCaffèMacchiato
Mister?enan?ubovi?
OleKingHenry?
Carlos?ºElEmperador
};
my($new, $best, $ok);
my $mask = " %10s %-8s %s\n";
for my $old (@strings) {
print "Testing $old\n";
($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Worst:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Ok:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Better:", $new;
($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Best:", $new;
}
Run Code Online (Sandbox Code Playgroud)
当您可以在此数据集上获得与"最佳"相同的分数时,您将知道您已正确完成此操作.在那之前,你还没有.这里没有其他人比"好"做得更好,大多数人都做得"最糟糕".我期待有人发布正确的ℂ♯代码.
我注意到StackOverflow的突出显示代码再次是悲惨的stoopid.他们正在制造与此处提到的其他穷人方法相同的旧跛脚(大多数但不是全部).是不是很久以后把ASCII放到休息状态?它再也没有意义了,假装它就是你所拥有的完全错误.这会导致错误的代码.
我开始基于Binary Worrier的代码制作一个简单的扩展方法,它将正确处理首字母缩略词,并且是可重复的(不会破坏已经间隔的单词).这是我的结果.
public static string UnPascalCase(this string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
var newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
var currentUpper = char.IsUpper(text[i]);
var prevUpper = char.IsUpper(text[i - 1]);
var nextUpper = (text.Length > i + 1) ? char.IsUpper(text[i + 1]) || char.IsWhiteSpace(text[i + 1]): prevUpper;
var spaceExists = char.IsWhiteSpace(text[i - 1]);
if (currentUpper && !spaceExists && (!nextUpper || !prevUpper))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Run Code Online (Sandbox Code Playgroud)
以下是此功能通过的单元测试用例.我将tchrist的大部分建议案例添加到此列表中.其中三个没有通过(两个只是罗马数字)被注释掉:
Assert.AreEqual("For You And I", "ForYouAndI".UnPascalCase());
Assert.AreEqual("For You And The FBI", "ForYouAndTheFBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "AManAPlanACanalPanama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNSServer".UnPascalCase());
Assert.AreEqual("For You And I", "For You And I".UnPascalCase());
Assert.AreEqual("Mount M? Kinley National Park", "MountM?KinleyNationalPark".UnPascalCase());
Assert.AreEqual("El Álamo Tejano", "ElÁlamoTejano".UnPascalCase());
Assert.AreEqual("The Ævar Arnfjörð Bjarmason", "TheÆvarArnfjörðBjarmason".UnPascalCase());
Assert.AreEqual("Il Caffè Macchiato", "IlCaffèMacchiato".UnPascalCase());
//Assert.AreEqual("Mister ?enan ?ubovi?", "Mister?enan?ubovi?".UnPascalCase());
//Assert.AreEqual("Ole King Henry ?", "OleKingHenry?".UnPascalCase());
//Assert.AreEqual("Carlos ?º El Emperador", "Carlos?ºElEmperador".UnPascalCase());
Assert.AreEqual("For You And The FBI", "For You And The FBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "A Man A Plan A Canal Panama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNS Server".UnPascalCase());
Assert.AreEqual("Mount M? Kinley National Park", "Mount M? Kinley National Park".UnPascalCase());
Run Code Online (Sandbox Code Playgroud)
此正则表达式在每个大写字母前面放置一个空格字符:
using System.Text.RegularExpressions;
const string myStringWithoutSpaces = "ThisIsAStringWithoutSpaces";
var myStringWithSpaces = Regex.Replace(myStringWithoutSpaces, "([A-Z])([a-z]*)", " $1$2");
Run Code Online (Sandbox Code Playgroud)
注意前面的空格,如果是“$1$2”,这将完成它。
这是结果:
"This Is A String Without Spaces"
Run Code Online (Sandbox Code Playgroud)