dat*_*den 2 python conditional split python-2.7
我有这个列表,由标签和权重字符串组成:
lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175',
'electronic 46270', 'female vocalists 42565', 'favorites 39921',
'Love 34901', 'dance 33618', '00s 31432']
Run Code Online (Sandbox Code Playgroud)
而我正试图将其转换为元组,如:
[('rock ', '101071'), ('pop ', '69159'), ('alternative ', '55777'), ('indie ', '48175'),
('electronic ', '46270'), ('female vocalists ', '42565'), ('favorites ', '39921'),
('Love ', '34901'), ('dance ', '33618'), ('s ', '0031432')]
Run Code Online (Sandbox Code Playgroud)
这里,每个字符串被拆分为元组,使得每个元素的索引0包含除最后一个字之外的字,索引1处的元素包含字符串的最后一个字.
为实现这一目标,我的代码是:
tags=[]
weights=[]
for i in lst:
tag = ''.join([x for x in i if not x.isdigit()])
tags.append(tag)
weight = ''.join([x for x in i if x.isdigit()])
weights.append(weight)
Run Code Online (Sandbox Code Playgroud)
然后,如果我这样做:
print zip(tags, weights)
Run Code Online (Sandbox Code Playgroud)
我得到了理想的结果.但不幸的是一些标签本身在于数字,就像00's在lst.
我怎样才能正确格式化('00s ', '0031432')?
PS:作为一种替代的分裂方法,i.split("")并不理想,因为集合中的某些标签有很多单词.
您可以使用str.rsplit()基于空格分割字符串maxsplit为1.例如:
>>> lst = ['rock 101071', 'pop 69159', 'alternative 55777', 'indie 48175', 'electronic 46270', 'female vocalists 42565', 'favorites 39921', 'Love 34901', 'dance 33618', '00s 31432']
>>> [s.rsplit(' ', 1) for s in lst]
[['rock', '101071'], ['pop', '69159'], ['alternative', '55777'], ['indie', '48175'], ['electronic', '46270'], ['female vocalists', '42565'], ['favorites', '39921'], ['Love', '34901'], ['dance', '33618'], ['00s', '31432']]
Run Code Online (Sandbox Code Playgroud)
但这将是嵌套列表的列表(我认为应该没问题).但是如果必须在问题中提到嵌套元组,则可以将值类型转换为元组:
[tuple(s.rsplit(' ', 1)) for s in lst]
Run Code Online (Sandbox Code Playgroud)