Pandas: Melting columns containing tuples

big*_*377 5 python dataframe pandas

考虑一个熊猫df,其列包含等长的元组。

L1 = [['ID1', ('key1a','key1b','key1c'), ('value1a','value1b','value1c')],
      ['ID2', ('key2a','key2b','key2c'), ('value2a','value2b','value2c')]]
df1 = pd.DataFrame(L1,columns=['ID','Key','Value'])

>>> df1
    ID                    Key                        Value
0  ID1  (key1a, key1b, key1c)  (value1a, value1b, value1c)
1  ID2  (key2a, key2b, key2c)  (value2a, value2b, value2c)
Run Code Online (Sandbox Code Playgroud)

如下垂直展开的最简单方法是什么?

    ID    Key    Value
0  ID1  key1a  value1a
1  ID1  key1b  value1b
2  ID1  key1c  value1c
3  ID2  key2a  value2a
4  ID2  key2b  value2b
5  ID2  key2c  value2c
6  ID3  key3a  value3a
7  ID3  key3b  value3b
8  ID3  key3c  value3c
Run Code Online (Sandbox Code Playgroud)

Ale*_*der 1

rows = []
for _, row in df1.iterrows():
    [rows.append([row['ID'], key, val]) for key, val in zip(row['Key'], row['Value'])]

>>> pd.DataFrame(rows)
     0      1        2
0  ID1  key1a  value1a
1  ID1  key1b  value1b
2  ID1  key1c  value1c
3  ID2  key2a  value2a
4  ID2  key2b  value2b
5  ID2  key2c  value2c
Run Code Online (Sandbox Code Playgroud)

时序(10k 行)

df2 = pd.DataFrame({
    'ID': ['ID' + str(n) for n in range(10000)], 
    'Key': [tuple('key' + str(n) + letter for letter in ('a', 'b', 'c')) for n in range(10000)], 
    'Value': [tuple('value' + str(n) + letter for letter in ('a', 'b', 'c')) for n in range(10000)]})

%timeit df2.set_index('ID').stack().apply(lambda x: pd.Series(x)).unstack(0).T.reset_index()
1 loops, best of 3: 3.51 s per loop

%%timeit
rows = []
for _, row in df1.iterrows():
    [rows.append([row['ID'], key, val]) for key, val in zip(row['Key'], row['Value'])]
df_new = pd.DataFrame(rows)
1 loops, best of 3: 1.22 s per loop
Run Code Online (Sandbox Code Playgroud)