doo*_*aba 5 python python-3.x pandas
我是Python的新手,正在为X数量的模型运行初始化参数。我需要从N个字典创建每个可能的组合,每个字典都具有嵌套数据。
我知道我需要以某种方式使用itertools.product,但是我在如何浏览字典上受阻。也许我什至不应该使用字典,而是json之类的东西。我也知道这将创建很多参数/运行。
编辑:添加了注释的说明。我想创建一个需要n个字典的函数-例如。def func(dict *)----作为输入,并创建所有字典中所有单个键/值对的所有可能组合,并返回一个包含所有组合的大DF。
我的数据如下所示:
词典1{
"chisel": [
{"type": "chisel"},
{"depth": [152, 178, 203]},
{"residue incorporation": [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]},
{"timing": ["10-nov", "10-apr"]},
],
"disc": [
{"type": "disc"},
{"depth": [127, 152, 178, 203]},
{"residue incorporation": [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]},
{"timing": ["10-nov", "10-apr"]},
],
"no_till": [
{"type": "user_defined"},
{"depth": [0]},
{"residue incorporation": [0.0]},
{"timing": ["10-apr"]},
],
}
Run Code Online (Sandbox Code Playgroud)
词典2
{
"nh4_n":
{
"kg/ha":[110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225],
"fertilize_on":"10-apr"
},
"urea_n":
{
"kg/ha":[110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225],
"fertilize_on":"10-apr"
}
}
Run Code Online (Sandbox Code Playgroud)
词典3
{
"maize": {
"sow_crop": 'maize',
"cultivar": ['B_105', 'B_110'],
"planting_dates": [
'20-apr', '27-apr', '4-may', '11-may', '18-may', '25-may', '1-jun', '8-jun', '15-jun'],
"sowing_density": [8],
"sowing_depth": [51],
"harvest": ['maize'],
}
}
Run Code Online (Sandbox Code Playgroud)
例如,使用上面的三个词典,我将字典“ chisel”和itertools.product以某种方式与字典2中的每个嵌套字典(例如,“ nh4_n”)和字典3中的每个嵌套字典(在这种情况下,只有一个,因此每个品种,种植日期等)。我还想将每个键值对中的键用作DF列标题。

主要问题是数据字典格式不一致:
fix_list_dicts:def fix_list_dicts(data: dict) -> dict:
"""
Given a dict where the values are a list of dicts:
(1) convert the value to a dict of dicts
(2) if any second level value is a str, convert it to a list
"""
data_new = dict()
for k, v in data.items():
v_new = dict()
for x in v:
for k1, v1 in x.items():
if type(v1) != list:
x[k1] = [v1]
v_new.update(x)
data_new[k] = v_new
return data_new
Run Code Online (Sandbox Code Playgroud)
add_top_key_as_value:def add_top_key_as_value(data: dict, new_key: str) -> dict:
"""
Given a dict of dicts, where top key is not a 2nd level value:
(1) add new key: value pair to second level
"""
for k, v in data.items():
v.update({new_key: k})
data[k] = v
return data
Run Code Online (Sandbox Code Playgroud)
str_value_to_list:def str_value_to_list(data: dict) -> dict:
"""
Given a dict of dicts:
(1) Convert any second level value from str to list
"""
for k, v in data.items():
for k2, v2 in v.items():
if type(v2) != list:
data[k][k2] = [v2]
return data
Run Code Online (Sandbox Code Playgroud)
from pprint import pprint as pp
Run Code Online (Sandbox Code Playgroud)
d1 = fix_list_dicts(d1)
pp(d1)
{'chisel': {'depth': [152, 178, 203],
'residue incorporation': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'timing': ['10-nov', '10-apr'],
'type': ['chisel']},
'disc': {'depth': [127, 152, 178, 203],
'residue incorporation': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'timing': ['10-nov', '10-apr'],
'type': ['disc']},
'no_till': {'depth': [0],
'residue incorporation': [0.0],
'timing': ['10-apr'],
'type': ['user_defined']}}
Run Code Online (Sandbox Code Playgroud)
d2 = add_top_key_as_value(d2, 'fertilizer')
d2 = str_value_to_list(d2)
{'nh4_n': {'fertilize_on': ['10-apr'],
'fertilizer': ['nh4_n'],
'kg/ha': [110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225]},
'urea_n': {'fertilize_on': ['10-apr'],
'fertilizer': ['urea_n'],
'kg/ha': [110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225]}}
Run Code Online (Sandbox Code Playgroud)
d3 = str_value_to_list(d3)
{'maize': {'cultivar': ['B_105', 'B_110'],
'harvest': ['maize'],
'planting_dates': ['20-apr', '27-apr', '4-may', '11-may', '18-may', '25-may', '1-jun', '8-jun', '15-jun'],
'sow_crop': ['maize'],
'sowing_density': [8],
'sowing_depth': [51]}}
Run Code Online (Sandbox Code Playgroud)
import pandas as pd
Run Code Online (Sandbox Code Playgroud)
combine_the_data:def combine_the_data(data: list) -> dict:
"""
Given a list of dicts:
(1) convert each dict into DataFrame
(2) set the indices to 0
(3) add each DataFrame to df_dict
"""
df_dict = dict()
for i, d in enumerate(data):
df = pd.DataFrame.from_dict(d, orient='index')
df.index = [0 for _ in range(len(df))]
df_dict[f'd_{i}'] = df
return df_dict
Run Code Online (Sandbox Code Playgroud)
merge_df_dict:def merge_df_dict(data: dict) -> pd.DataFrame:
"""
Given a dict of DataFrames
(1) merge them on the index
"""
df = pd.DataFrame()
for _, v in data.items():
df = df.merge(v, how='outer', left_index=True, right_index=True)
return df
Run Code Online (Sandbox Code Playgroud)
data = [d1, d2, d3]
df_dict = combine_the_data(data)
df_dict['d_0']
Run Code Online (Sandbox Code Playgroud)
df_dict['d_1']
Run Code Online (Sandbox Code Playgroud)
df_dict['d_2']
Run Code Online (Sandbox Code Playgroud)
df = merge_df_dict(df_dict)
Run Code Online (Sandbox Code Playgroud)
pd.DataFrame.explode所有列表:pandas,但爆炸是其中最好的。pandasv0.25 吗?然后得到它!df.reset_index(drop=True, inplace=True) # the DataFrame must have a unique 0...x index
for col in df.columns:
df = df.explode(col).reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)
鉴于:
len(kg/ha) = 24len(cultivar) = 2len(plantint_dates) = 9行数user_defined= 2
总组合user_defined= 864
我没有手动计算其他两个types,但由于user_defined具有正确的组合数量,我希望其他人也能这样做。
df.type.value_counts()
disc 48384
chisel 36288
user_defined 864
Name: type, dtype: int64
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
82 次 |
| 最近记录: |