包含对象列表的pandas列,根据键名拆分此列,并将值存储为逗号分隔值

Nik*_*pta 0 python json list dataframe pandas

我有一个包含列的数据框:

A
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}]
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}]
[{"A": 28, "B": "abc"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}]
[{"A": 28, "B": "abc"},{"A": 29, "B": "xyz"}]
Run Code Online (Sandbox Code Playgroud)

输出应该是这样的:

A              B
28,29,30       abc,def,hij
31,32          hij,abc
28             abc
28,29,30       abc,def,hij
28,29,30       abc,klm,nop
28,29          abc,xyz
Run Code Online (Sandbox Code Playgroud)

如何根据键名将对象列表拆分为列,并将它们存储为逗号分隔值,如上所示.

WeN*_*Ben 5

通过使用stack然后groupby

df.A.apply(pd.Series).stack().\
     apply(pd.Series).groupby(level=0).\
        agg(lambda x :','.join(x.astype(str)))
Out[457]: 
          A            B
0  28,29,30  abc,def,hij
1     31,32      hij,abc
2        28          abc
3  28,29,30  abc,def,hij
4  28,29,30  abc,klm,nop
Run Code Online (Sandbox Code Playgroud)

数据输入:

df=pd.DataFrame({'A':[[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 31, "B": "hij"},{"A": 32, "B": "abc"}],
[{"A": 28, "B": "abc"}],[{"A": 28, "B": "abc"},{"A": 29, "B": "def"},{"A": 30, "B": "hij"}],
[{"A": 28, "B": "abc"},{"A": 29, "B": "klm"},{"A": 30, "B": "nop"}]]})
Run Code Online (Sandbox Code Playgroud)

有关您的其他问题,请阅读csv

import ast
df=pd.read_csv(r'your.csv',dtype={'A':object})

df['A'] = df['A'].apply(ast.literal_eval)
Run Code Online (Sandbox Code Playgroud)