pandas列表的字典来分隔列

joh*_*nyb 5 python python-3.x pandas

我有一个如下数据集:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]
Run Code Online (Sandbox Code Playgroud)

如何提取键值对,并将它们转换为一直扩展的数据帧?

预期产量:

name    status   number    job    money    wife    group   kids 
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one
Run Code Online (Sandbox Code Playgroud)

该消息包含多种不同的密钥类型.

任何帮助将不胜感激.

jez*_*ael 5

这不简单.

需要值转换为listdictreplace(\s+一个或多个空格),然后使用ast.

然后可以使用DataFrame构造函数concat,popdrop column from df:

import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], 
                                ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
   kids  money group   job  money  wife
0   NaN   none   NaN  none    NaN  none
1   NaN    NaN  band   NaN  10000   yes
2   one    NaN  jail  none   none  none

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number  kids  money group   job  money  wife
0   matt    active   12345   NaN   none   NaN  none    NaN  none
1  james    active   23456   NaN    NaN  band   NaN  10000   yes
2   adam  inactive   34567   one    NaN  jail  none   none  none
Run Code Online (Sandbox Code Playgroud)

编辑:

另一个解决方案yaml:

import yaml

df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
  group   job kids  money  wife
0   NaN  None  NaN   none  none
1  band   NaN  NaN  10000  True
2  jail  none  one   none  None

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number group   job kids  money  wife
0   matt    active   12345   NaN  None  NaN   none  none
1  james    active   23456  band   NaN  NaN  10000  True
2   adam  inactive   34567  jail  none  one   none  None
Run Code Online (Sandbox Code Playgroud)