在elasticsearch中用空值索引pandas数据帧但没有nan

Question

在elasticsearch中用空值索引pandas数据帧但没有nan

我正在从elasticsearch 中的pandas 数据帧中索引数据。我为某些 es 字段设置了 null_value，但没有为其他字段设置。如何删除没有 null_value 的列，但保留那些具有 null_value 的列（将值设置为 None）？

ES映射：

    "properties": {
        "sa_start_date": {"type": "date", "null_value": "1970-01-01T00:00:00+00:00"},
        "location_name": {"type": "text"},

Run Code Online (Sandbox Code Playgroud)

代码：

cols_with_null_value = ['sa_start_date']
orig = [{
    'meter_id': 'M1',
    'sa_start_date': '',
    'location_name': ''
},{
    'meter_id': 'M1',
    'sa_start_date': '',
    'location_name': 'a'
}]
df = pd.DataFrame.from_dict(orig)

df['sa_start_date'] = df['sa_start_date'].apply(pd.to_datetime, utc=True, errors='coerce')
df.replace({'': np.nan}, inplace=True)

Run Code Online (Sandbox Code Playgroud)

df:
   meter_id sa_start_date location_name
0       M1           NaT           NaN
1       M1           NaT             a

Run Code Online (Sandbox Code Playgroud)

Elasticsearch 索引所需的字典：

{"meter_id": M1, "sa_start_date": None}
{"meter_id": M1, "sa_start_date": None, "location_name": "a"}

Run Code Online (Sandbox Code Playgroud)

注意带有 NaN 的 location_name 单元格不会被索引，但带有 NaT 的 sa_start_date 单元格会被索引。我尝试过很多事情，每一件都比上一件更可笑；没有什么值得展示的。任何想法表示赞赏！

尝试过这个，但 None 与 NaN 一起被删除。

df[null_value_cols] = df[null_value_cols].replace({np.nan: None})
df:
   meter_id sa_start_date location_name
0       M1          None           NaN
1       M1          None             a
for row in df.iterrows():
    ser = row[1]
    ser.dropna(inplace=True)

    lc = {k: v for k, v in dict(row[1]).items()}

lc: {'meter_id': 'M1'}
lc: {'meter_id': 'M1', 'location_name': 'a'}

Run Code Online (Sandbox Code Playgroud)

Answer 1

ber*_*nie 6

不要.dropna()在这里使用。它要么删除整行，要么删除整列；并且您希望保留除空位置名称之外的所有内容。

您可以通过以下方式执行此操作：

df.replace({'': None}, inplace=True) # replace with None instead of np.nan

for idx,row in df.iterrows(): 
    lc = {k:v for k,v in row.items() if not (k == 'location_name' and v is None)} 
    print(lc)

Run Code Online (Sandbox Code Playgroud)

结果：

{'meter_id': 'M1', 'sa_start_date': None}
{'meter_id': 'M1', 'sa_start_date': None, 'location_name': 'a'}

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，12 月前
查看次数：	1936 次
最近记录：	5 年，12 月前