在连接词上分裂字符串

kas*_*f4u 1 c#

我需要根据连接词,即on,in,from等在数组中拆分几个字符串.

string sampleString = "what was total sales for pencils from Japan in 1999";
Run Code Online (Sandbox Code Playgroud)

期望的结果:

what was total sales

for pencils

from japan 

in 1999
Run Code Online (Sandbox Code Playgroud)

我熟悉基于一个单词而不是多个单词同时拆分字符串:

string[] stringArray = sampleString.Split(new string[] {"of"}, StringSplitOptions.None);
Run Code Online (Sandbox Code Playgroud)

有什么建议?

ang*_*son 5

对于此特定方案,您可以使用正则表达式执行此操作.

你将不得不使用一种称为超前模式的东西,因为否则你要拆分的单词将从结果中删除.

这是一个小的LINQPad程序,演示:

void Main()
{
    string sampleString = "what was total sales for pencils from Japan in 1999";
    Regex.Split(sampleString, @"\b(?=of|for|in|from)\b").Dump();
}
Run Code Online (Sandbox Code Playgroud)

输出:

what was total sales  
for pencils  
from Japan  
in 1999 
Run Code Online (Sandbox Code Playgroud)

但是,正如我在评论中所说的那样,它会被包含你所分割的任何单词的地方名称所绊倒,所以:

string sampleString = "what was total sales for pencils from the Isle of Islay in 1999";
Regex.Split(sampleString, @"\b(?=of|for|in|from)\b").Dump();
Run Code Online (Sandbox Code Playgroud)

输出:

what was total sales  
for pencils  
from the Isle  
of Islay  
in 1999 
Run Code Online (Sandbox Code Playgroud)

正则表达式可以像这样重写,以便在将来的维护中更具表现力:

Regex.Split(sampleString, @"
    \b          # Must be a word boundary here
                # makes sure we don't match words that contain the split words, like 'fortune'
    (?=         # lookahead group, will match, but not be consumed/zero length
        of      # List of words, separated by the OR operator, |
        |for
        |in
        |from
    )
    \b          # Also a word boundary", RegexOptions.IgnorePatternWhitespace).Dump();
Run Code Online (Sandbox Code Playgroud)

您可能还想添加RegexOptions.IgnoreCase选项,以匹配"Of"和"OF"等.