Geo*_*ler 3 python json dataframe pandas
如何在pandas中简单地分隔JSON列:
pd.DataFrame({
'col1':[1,2],
'col2':["{'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}",
"{'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}"]})
col1 col2
0 1 {'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}
1 2 {'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}
Run Code Online (Sandbox Code Playgroud)
以简单和python的方式进入真正的列?
期望的输出:
pd.DataFrame({'col1':[1,2], 'foo':[1,3], 'bar':[2,5],
'baz_foo':[2,2], 'baz_x':[1,1]})
col1 foo bar baz_foo baz_x
0 1 1 2 2 1
1 2 3 5 2 1
Run Code Online (Sandbox Code Playgroud)
json_normalize 是解决嵌套JSON数据的正确方法.
import ast
from pandas.io.json import json_normalize
v = json_normalize([ast.literal_eval(j) for j in df.pop('col2')], sep='_')
pd.concat([df, v], 1)
col1 bar baz_foo baz_x foo
0 1 2 2 1 1
1 2 5 2 1 3
Run Code Online (Sandbox Code Playgroud)
请注意,您仍然必须先将JSON转换为字典.
如果你想在"col2"中处理NaN,请尝试join在最后使用:
df = pd.DataFrame({
'col1':[1,2,3],
'col2':["{'foo':1, 'bar':2, 'baz':{'foo':2, 'x':1}}",
"{'foo':3, 'bar':5, 'baz':{'foo':2, 'x':1}}",
np.nan]})
Run Code Online (Sandbox Code Playgroud)
v = json_normalize([
ast.literal_eval(j) for j in df['col2'].dropna()], sep='_'
)
v.index = df.index[df.pop('col2').notna()]
df.join(v, how='left')
col1 bar baz_foo baz_x foo
0 1 2.0 2.0 1.0 1.0
1 2 5.0 2.0 1.0 3.0
2 3 NaN NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
537 次 |
| 最近记录: |