我有一个像这样的DataFrame:
df = pd.DataFrame({'name': ['toto', 'tata', 'tati'], 'choices': 0})
df['choices'] = df['choices'].astype(object)
df['choices'][0] = [1,2,3]
df['choices'][1] = [5,4,3,1]
df['choices'][2] = [6,3,2,1,5,4]
print(df)
choices name
0 [1, 2, 3] toto
1 [5, 4, 3, 1] tata
2 [6, 3, 2, 1, 5, 4] tati
Run Code Online (Sandbox Code Playgroud)
我想基于这样的df构建一个DataFrame
choice rank name
0 1 0 toto
1 2 1 toto
2 3 2 toto
3 5 0 tata
4 4 1 tata
5 3 2 tata
6 1 3 tata
7 6 0 tati
8 3 1 tati
9 2 2 tati
10 1 3 tati
11 5 4 tati
12 4 5 tati
Run Code Online (Sandbox Code Playgroud)
我想使用每个值的列表和索引填充新行.
我这样做了
size = df['choices'].map(len).sum()
df2 = pd.DataFrame(index=range(size), columns=df.columns)
del df2['choices']
df2['choice'] = np.nan
df2['rank'] = np.nan
k = 0
for i in df.index:
choices = df['choices'][i]
for rank, choice in enumerate(choices):
df2['name'][k] = df['name'][i]
df2['choice'][k] = choice
df2['rank'][k] = rank
k += 1
Run Code Online (Sandbox Code Playgroud)
但我更喜欢矢量化解决方案.是否可以使用Python/Pandas?
In [4]: s = df.choices.apply(Series).stack()
In [5]: s.name = 'choices' # needs a name to join
In[6]: del df['choices']
In[7]: df1 = df.join(s.reset_index(level=1))
In[8]: df1.columns = ['name', 'rank', 'choice']
In [9]: df1.sort(['name', 'rank']).reset_index(drop=True)
Out[9]:
name rank choice
0 tata 0 5
1 tata 1 4
2 tata 2 3
3 tata 3 1
4 tati 0 6
5 tati 1 3
6 tati 2 2
7 tati 3 1
8 tati 4 5
9 tati 5 4
10 toto 0 1
11 toto 1 2
12 toto 2 3
Run Code Online (Sandbox Code Playgroud)
这与我的这个解决方案有关,但在你的情况下,你使用的是索引(rank)而不是丢弃它.
归档时间: |
|
查看次数: |
2445 次 |
最近记录: |