Rah*_*Dev 5 python dataframe pandas
假设我有两个以下形式的字典:
{'A':[1,2,3,4,5,6,7],
 'B':[12,13,14,15,16,17,18} - Belongs to category "M"
{'A':[8,9,10,11,12,13,14],
 'B':[18,19,20,21,22,23,24]} - Belongs to category "P"
现在生成的数据框应该是这样的形式——
Name . Value . Category
A    .  1    .  M
A    .  8    .  P
A    .  10   .  P
B    .  12   .  M
等等。怎样才能实现这样的目标呢?
这是比 user3483203 建议的更可潘多拉的方法。这避免了不必要的迭代,速度更快(对于足够大的数据集)并且更惯用。
\n\nm = {'A':[1,2,3,4,5,6,7],\n     'B':[12,13,14,15,16,17,18]}\n\np = {'A':[8,9,10,11,12,13,14],\n     'B':[18,19,20,21,22,23,24]}\n\n\np_df = pd.DataFrame(p).melt(value_name='value')\nm_df = pd.DataFrame(m).melt(value_name='value')\n\np_df['category'] = 'P'\nm_df['category'] = 'M'\n\nresult = pd.concat([m_df, p_df], ignore_index=True)\nm = {'A': list(range(0, 100_000)), 'B': list(range(100_000, 200_000))}\np = {'A': list(range(200_000, 300_000)), 'B': list(range(300_000, 400_000))}\n开始了:
\n\n%%timeit\np_df = pd.DataFrame(p).melt(value_name='value')\nm_df = pd.DataFrame(m).melt(value_name='value')\n\np_df['category'] = 'P'\nm_df['category'] = 'M'\n\nresult = pd.concat([m_df, p_df], ignore_index=True)\n120 ms \xc2\xb1 每个循环 3.16 ms(平均 \xc2\xb1 标准偏差 7 次运行,每次 10 个循环)
\n\n%%timeit\ncategories = ['M', 'P']\ndcts = [m, p]\ndfs = [\n    pd.DataFrame([[k, el, cat] for k, v in dct.items() for el in v])\n    for dct, cat in zip(dcts, categories)\n]\n\ncols = {'columns': {0: 'Name', 1: 'Value', 2: 'Category'}}\nresult = pd.concat(dfs).reset_index(drop=True).rename(**cols)\n207 ms \xc2\xb1 每次循环 8.9 ms(平均 \xc2\xb1 标准偏差 7 次运行,每次 1 次循环)
\n