保存和导出python pandas数据框的dtypes信息

Nik*_* VJ 1 python json series dataframe pandas

我有一个名为df的熊猫DataFrame。随着df.dtypes我可以在屏幕上打印:

arrival_time      object
departure_time    object
drop_off_type      int64
extra             object
pickup_type        int64
stop_headsign     object
stop_id           object
stop_sequence      int64
trip_id           object
dtype: object
Run Code Online (Sandbox Code Playgroud)

我想保存此信息,以便可以将其与其他数据进行比较,在其他地方进行类型转换,等等。我想将其保存到本地文件中,然后在其他程序无法恢复的地方将其恢复。但是我不知道怎么做。显示各种转换的结果。

df.dtypes.to_dict()
{'arrival_time': dtype('O'),
 'departure_time': dtype('O'),
 'drop_off_type': dtype('int64'),
 'extra': dtype('O'),
 'pickup_type': dtype('int64'),
 'stop_headsign': dtype('O'),
 'stop_id': dtype('O'),
 'stop_sequence': dtype('int64'),
 'trip_id': dtype('O')}
----
df.dtypes.to_json()
'{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}'
----
json.dumps( df.dtypes.to_dict() )
...
TypeError: dtype('O') is not JSON serializable

----
list(xdf.dtypes)
[dtype('O'),
 dtype('O'),
 dtype('int64'),
 dtype('O'),
 dtype('int64'),
 dtype('O'),
 dtype('O'),
 dtype('int64'),
 dtype('O')]
Run Code Online (Sandbox Code Playgroud)

如何保存和导出/归档熊猫DataFrame的dtype信息?

jpp*_*jpp 6

pd.DataFrame.dtypes返回一个pd.Series对象。这意味着您可以像处理Pandas中的任何常规系列一样操作它:

df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]})

res = df.dtypes.to_frame('dtypes').reset_index()

print(res)

  index   dtypes
0     A   object
1     B  float64
2     C    int64
3     D     bool
Run Code Online (Sandbox Code Playgroud)

输出到csv / excel / pickle

然后,您可以使用任何方法,你通常会存储一个数据帧,如to_csvto_excelto_pickle,等注分配咸菜是推荐,因为它取决于版本。

输出到json

如果您希望轻松地将其存储和加载为字典,则常用格式为json。如您所见,您需要先转换为str输入:

import json

# first create dictionary
d = res.set_index('index')['dtypes'].astype(str).to_dict()

with open('types.json', 'w') as f:
    json.dump(d, f)

with open('types.json', 'r') as f:
    data_types = json.load(f)

print(data_types)

{'A': 'object', 'B': 'float64', 'C': 'int64', 'D': 'bool'}
Run Code Online (Sandbox Code Playgroud)