big*_*377 5 python dataframe pandas
考虑一个熊猫df,其列包含等长的元组。
L1 = [['ID1', ('key1a','key1b','key1c'), ('value1a','value1b','value1c')],
['ID2', ('key2a','key2b','key2c'), ('value2a','value2b','value2c')]]
df1 = pd.DataFrame(L1,columns=['ID','Key','Value'])
>>> df1
ID Key Value
0 ID1 (key1a, key1b, key1c) (value1a, value1b, value1c)
1 ID2 (key2a, key2b, key2c) (value2a, value2b, value2c)
Run Code Online (Sandbox Code Playgroud)
如下垂直展开的最简单方法是什么?
ID Key Value
0 ID1 key1a value1a
1 ID1 key1b value1b
2 ID1 key1c value1c
3 ID2 key2a value2a
4 ID2 key2b value2b
5 ID2 key2c value2c
6 ID3 key3a value3a
7 ID3 key3b value3b
8 ID3 key3c value3c
Run Code Online (Sandbox Code Playgroud)
rows = []
for _, row in df1.iterrows():
[rows.append([row['ID'], key, val]) for key, val in zip(row['Key'], row['Value'])]
>>> pd.DataFrame(rows)
0 1 2
0 ID1 key1a value1a
1 ID1 key1b value1b
2 ID1 key1c value1c
3 ID2 key2a value2a
4 ID2 key2b value2b
5 ID2 key2c value2c
Run Code Online (Sandbox Code Playgroud)
时序(10k 行)
df2 = pd.DataFrame({
'ID': ['ID' + str(n) for n in range(10000)],
'Key': [tuple('key' + str(n) + letter for letter in ('a', 'b', 'c')) for n in range(10000)],
'Value': [tuple('value' + str(n) + letter for letter in ('a', 'b', 'c')) for n in range(10000)]})
%timeit df2.set_index('ID').stack().apply(lambda x: pd.Series(x)).unstack(0).T.reset_index()
1 loops, best of 3: 3.51 s per loop
%%timeit
rows = []
for _, row in df1.iterrows():
[rows.append([row['ID'], key, val]) for key, val in zip(row['Key'], row['Value'])]
df_new = pd.DataFrame(rows)
1 loops, best of 3: 1.22 s per loop
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
158 次 |
| 最近记录: |