在列表中连接元组的元素

Col*_*rdo 0 python

我有一个元组列表:

[('fruit', 'O'), ('is', 'O'), ('the', 'O'), 
 ('subject', 'O'), ('of', 'O'), ('a', 'O'), 
 ('Roald', 'PERSON'), ('Dahl', 'PERSON'), ('children', 'O'), 
 ("'s", 'O'), ('book', 'O'), ('?', 'O')]`
Run Code Online (Sandbox Code Playgroud)

我想将此列表缩减为:

[('fruit', 'O'), ('is', 'O'), ('the', 'O'), 
 ('subject', 'O'), ('of', 'O'), ('a', 'O'), 
 ('Roald Dahl', 'PERSON'), ('children', 'O'), 
 ("'s", 'O'), ('book', 'O'), ('?', 'O')]`
Run Code Online (Sandbox Code Playgroud)

也就是说,任何第二个值不是"O"的连续元组都应该将它们的第一个值连接起来.这适用于任何长度的列表,以及任何数量的连续元组.

尝试

def join_tags(list_tags):
  res = []
  last_joined = None
  last_seen = (None, None)

  for tup in list_tags:
    if tup[1] == 'O':
      res.append(tup)
      last_joined = None
    else:
      if tup[1] == last_seen[1]:
        if last_joined:
          new_tup = (last_joined[0] + ' ' + tup[0], tup[1])
          last_joined = new_tup
          res.append(new_tup)
        else:
          new_tup = (tup[0] + ' ' + tup[0], tup[1])
          res.append(new_tup)
          last_joined = new_tup
      else:
        res.append(tup)
        last_joined = None
    last_seen = tup

  return res
Run Code Online (Sandbox Code Playgroud)

cs9*_*s95 6

如果您已经使用过itertools,它有很多有用的例程用于这样的操作.一个恰当命名的函数groupby在这里很有用.

编辑:感谢@ juanpa.arrivillaga的改进使用operator.

import itertools
from operator import itemgetter

r = []
for k, g in itertools.groupby(l, key=itemgetter(1)):
    if k == 'O':
        r.extend(g)
    else:
        r.append((' '.join([i[0] for i in g]), k))
Run Code Online (Sandbox Code Playgroud)

print(r)
[('fruit', 'O'),
 ('is', 'O'),
 ('the', 'O'),
 ('subject', 'O'),
 ('of', 'O'),
 ('a', 'O'),
 ('Roald Dahl', 'PERSON'),
 ('children', 'O'),
 ("'s", 'O'),
 ('book', 'O'),
 ('?', 'O')]
Run Code Online (Sandbox Code Playgroud)

这里,l是你的元组输入列表.

  • 您应该使用`itemgetter(1)`而不是布尔值,因为您实际上想要重新使用group-by键,这样您就不必实现轴列表.例如`if k =='O':r.extend(g)else:new_dat.append((''..join([x [0] for x in g]),k)) (2认同)