需要帮助拆分此字符串(由逗号和"和"分隔的名字和姓氏对)

sta*_*010 5 regex perl

我正在使用perl,需要分割由逗号分隔的作者姓名字符串以及最后一个"和".名称形成为名字和姓氏,如下所示:

$string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";
$string2 = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";
$string3 = "Jane Doe and Joe Smith";
# Next line doesn't work because there is no comma between last two names
@data = split(/,/, $string1);
Run Code Online (Sandbox Code Playgroud)

我只想将全名拆分为数组的元素,就像split()所做的那样,以便@data数组包含,例如:

@data[0]: "Joe Smith"
@data[1]: "Jason Jones"
@data[2]: "Jane Doe"
@data[3]: "Jack Jones"
Run Code Online (Sandbox Code Playgroud)

但是,问题是列表中的最后两个名称之间没有逗号.任何帮助,将不胜感激.

mu *_*ort 10

您可以在正则表达式中使用简单的替换进行拆分:

my @parts = split(/\s*,\s*|\s+and\s+/, $string1);
Run Code Online (Sandbox Code Playgroud)

例如:

$ perl -we 'my $string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*|\s+and\s+/, $string1)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $string2 = "Jane Doe and Joe Smith";print join("\n",split(/\s*,\s*|\s+and\s+/, $string2)),"\n"'
Jane Doe
Joe Smith
Run Code Online (Sandbox Code Playgroud)

如果你还要处理牛津逗号(即"这个,那个和另一个"),那么你可以使用

my @parts = split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $string1);
Run Code Online (Sandbox Code Playgroud)

例如:

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jack Jones
Run Code Online (Sandbox Code Playgroud)

感谢stackoverflowuser2010注意到这种情况.

你会希望\s*,\s*and\s+在开始时保持交替的其他分支不要在逗号或"和"上分开,这个顺序似乎也是有保证的:

从左到右尝试替代方案,因此找到的整个表达式匹配的第一个替代方案是选择的方案.