将包含字典列表的列转换为熊猫数据框中的多列

Question

将包含字典列表的列转换为熊猫数据框中的多列

我有一个 Pandas 数据框，如：

pd.DataFrame({'a':[1,2], 'b':[[{'c':1,'d':5},{'c':3, 'd':7}],[{'c':10,'d':50}]]})
Out[2]: 
   a                                         b
0  1  [{u'c': 1, u'd': 5}, {u'c': 3, u'd': 7}]
1  2                    [{u'c': 10, u'd': 50}]

Run Code Online (Sandbox Code Playgroud)

如果“b”中有多个元素，我想扩展“b”列并重复“a”列，如下所示：

Out[2]: 
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

Run Code Online (Sandbox Code Playgroud)

我试图在每一行上使用 apply 函数，但我没有成功，显然 apply 将一行转换为一行。

Answer 1

jez*_*ael 10

你可以用concat与list comprehension：

df = pd.concat([pd.DataFrame(x) for x in df['b']], keys=df['a'])
       .reset_index(level=1, drop=True).reset_index()

print (df)
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

Run Code Online (Sandbox Code Playgroud)

编辑：

如果索引是唯一的，则可以join用于所有列：

df1 = pd.concat([pd.DataFrame(x) for x in df['b']], keys=df.index)
        .reset_index(level=1,drop=True)
df = df.drop('b', axis=1).join(df1).reset_index(drop=True)
print (df)
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

Run Code Online (Sandbox Code Playgroud)

我尝试简化解决方案：

l = df['b'].str.len()
df1 = pd.DataFrame(np.concatenate(df['b']).tolist(), index=np.repeat(df.index, l))
df = df.drop('b', axis=1).join(df1).reset_index(drop=True)
print (df)
   a   c   d
0  1   1   5
1  1   3   7
2  2  10  50

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	6396 次
最近记录：	6 年，7 月前