如果数据框中不存在列,则返回空列[Panda]

Muh*_*han 1 python dataframe pandas

我创建了一个数据帧df,如下所示:

Type = ['A', 'B', 'C', 'D']
Size = [72,23,66,12]
df = pd.DataFrame({'Type': Type, 'Size': Size})
Run Code Online (Sandbox Code Playgroud)

我可以使用以下方法提取任何现有列

df_count = df['Size']
Run Code Online (Sandbox Code Playgroud)

但是,如果数据框太大,我不知道列是否存在于df中.如果我调用一个列,例如df ['Shape'],如下所示:

df_null = df['Shape']
Run Code Online (Sandbox Code Playgroud)

它返回"关键错误".但是我希望df_null应该得到一个名为"Shape"的空列.

roo*_*oot 8

使用DataFrame.get类似于以下的模式:

In [3]: df.get('Size', pd.Series(index=df.index, name='Size'))
Out[3]:
0    72
1    23
2    66
3    12
Name: Size, dtype: int64

In [4]: df.get('Shape', pd.Series(index=df.index, name='Shape'))
Out[4]:
0   NaN
1   NaN
2   NaN
3   NaN
Name: Shape, dtype: float64
Run Code Online (Sandbox Code Playgroud)

或者通过创建一个抽象函数来概括:

In [5]: get_column = lambda df, col: df.get(col, pd.Series(index=df.index, name=col))

In [6]: get_column(df, 'Size')
Out[6]:
0    72
1    23
2    66
3    12
Name: Size, dtype: int64

In [7]: get_column(df, 'Shape')
Out[7]:
0   NaN
1   NaN
2   NaN
3   NaN
Name: Shape, dtype: float64
Run Code Online (Sandbox Code Playgroud)

另一种选择可能是使用reindexsqueeze:

In [8]: df.reindex(columns=['Size']).squeeze()
Out[8]:
0    72
1    23
2    66
3    12
Name: Size, dtype: int64

In [9]: df.reindex(columns=['Shape']).squeeze()
Out[9]:
0   NaN
1   NaN
2   NaN
3   NaN
Name: Shape, dtype: float64
Run Code Online (Sandbox Code Playgroud)