joh*_*nyb 5 python python-3.x pandas
我有一个如下数据集:
name status number message
matt active 12345 [job: , money: none, wife: none]
james active 23456 [group: band, wife: yes, money: 10000]
adam inactive 34567 [job: none, money: none, wife: , kids: one, group: jail]
Run Code Online (Sandbox Code Playgroud)
如何提取键值对,并将它们转换为一直扩展的数据帧?
预期产量:
name status number job money wife group kids
matt active 12345 none none none none none
james active 23456 none 10000 none band none
adam inactive 34567 none none none none one
Run Code Online (Sandbox Code Playgroud)
该消息包含多种不同的密钥类型.
任何帮助将不胜感激.
这不简单.
需要值转换为list的dict由replace(\s+一个或多个空格),然后使用ast.
然后可以使用DataFrame构造函数concat,popdrop column from df:
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
Run Code Online (Sandbox Code Playgroud)
编辑:
另一个解决方案yaml:
import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife
0 NaN None NaN none none
1 band NaN NaN 10000 True
2 jail none one none None
df = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife
0 matt active 12345 NaN None NaN none none
1 james active 23456 band NaN NaN 10000 True
2 adam inactive 34567 jail none one none None
Run Code Online (Sandbox Code Playgroud)