Python - 提取序列中的所有驼峰大小写单词

jax*_*jax 2 python nltk

我试图在一个序列中的字符串中返回以大写字母或标题大小写开头的所有单词的列表.

例如,在字符串中John Walker Smith is currently in New York我想返回如下列表:

['John Walker Smith', 'New York']
Run Code Online (Sandbox Code Playgroud)

我的代码仅在有两个标题词时才有效.如何扩展它以在序列中拾取两个以上的标题词.

def get_composite_names(s):
    l = [x for x in s.split()]
    nouns = []
    for i in range(0,len(l)):
        if i > len(l)-2:
            break
        if l[i] == l[i].title() and l[i+1] == l[i+1].title():
                temp = l[i]+' '+l[i+1]
                nouns.append(temp)
    return nouns
Run Code Online (Sandbox Code Playgroud)

cma*_*her 6

这是在没有正则表达式的情况下实现此目的的一种方法:

from itertools import groupby

string = "John Walker Smith  is currently in New York"

groups = []

for key, group in groupby(string.split(), lambda x: x[0].isupper()):
    if key:
        groups.append(' '.join(list(group)))

print groups
# ['John Walker Smith', 'New York']
Run Code Online (Sandbox Code Playgroud)