Aak*_*tel 3 pandas pandas-groupby
我们有以下数据
Name genres
A Action|Adventure|Science Fiction|Thriller
B Action|Adventure|Science Fiction|Thriller
C Adventure|Science Fiction|Thriller
Run Code Online (Sandbox Code Playgroud)
我想要的数据是我的数据框
Name genres
A Action
A Adventure
A Science Fiction
A Thriller
B Action
B Adventure
B Science Fiction
B Thriller
C Adventure
C Science Fiction
C Thriller
Run Code Online (Sandbox Code Playgroud)
这是我的代码
gen = df1[df1['genres'].str.contains('|')]
gen1 = gen.copy()
gen2 = gen.copy()
gen3 = gen.copy()
gen4 = gen.copy()
gen1['genres'] = gen1['genres'].apply(lambda x: x.split("|")[0])
gen2['genres'] = gen2['genres'].apply(lambda x: x.split("|")[1])
gen3['genres'] = gen3['genres'].apply(lambda x: x.split("|")[2])
gen4['genres'] = gen4['genres'].apply(lambda x: x.split("|")[3])
Run Code Online (Sandbox Code Playgroud)
我收到错误
IndexError:列表索引超出范围
克里特岛流派列表split、repeat值str.len和最后展平列表chain.from_iterable:
from itertools import chain
genres = df['genres'].str.split('|')
df = pd.DataFrame({
'Name' : df['Name'].values.repeat(genres.str.len()),
'genres' : list(chain.from_iterable(genres.tolist()))
})
print (df)
Name genres
0 A Action
1 A Adventure
2 A Science Fiction
3 A Thriller
4 B Action
5 B Adventure
6 B Science Fiction
7 B Thriller
8 C Adventure
9 C Science Fiction
10 C Thriller
Run Code Online (Sandbox Code Playgroud)
编辑:
动态列数的解决方案:
print (df)
Name genres col
0 A Action|Adventure|Science Fiction|Thriller 2
1 B Action|Adventure|Science Fiction|Thriller 3
2 C Adventure|Science Fiction|Thriller 5
from itertools import chain
cols = df.columns.difference(['genres'])
genres = df['genres'].str.split('|')
df = (df.loc[df.index.repeat(genres.str.len()), cols]
.assign(genres=list(chain.from_iterable(genres.tolist()))))
print (df)
Name col genres
0 A 2 Action
0 A 2 Adventure
0 A 2 Science Fiction
0 A 2 Thriller
1 B 3 Action
1 B 3 Adventure
1 B 3 Science Fiction
1 B 3 Thriller
2 C 5 Adventure
2 C 5 Science Fiction
2 C 5 Thriller
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3101 次 |
| 最近记录: |