在创建数据框时分配列名会导致 nan 值

kav*_*kav 6 python dataframe pandas

我有一个正在转换为数据框的 dict 列表。当我尝试传递列参数时,输出值都是 nan。

# This code does not result in desired output

l = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
pd.DataFrame(l, columns=['c', 'd'])

    c   d
0   NaN NaN
1   NaN NaN
Run Code Online (Sandbox Code Playgroud)
# This code does result in desired output

l = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
df = pd.DataFrame(l)
df.columns = ['c', 'd']
df

    c   d
0   1   2
1   3   4
Run Code Online (Sandbox Code Playgroud)

为什么会这样?

jez*_*ael 8

因为如果在DataFrame构造函数中创建了来自键的字典传递列表的新列名:

l = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
print (pd.DataFrame(l))
   a  b
0  1  2
1  3  4
Run Code Online (Sandbox Code Playgroud)

如果字典键中不存在具有某些值的传递列参数,则从字典中过滤列,对于不存在的值,将创建具有缺失值的列,其顺序类似于列名称列表中的值:

#changed order working, because a,b keys at least in one dictionary
print (pd.DataFrame(l, columns=['b', 'a']))
   b  a
0  2  1
1  4  3

#filtered a, d filled missing values - key is not at least in one dictionary
print (pd.DataFrame(l, columns=['a', 'd']))
   a   d
0  1 NaN
1  3 NaN

#filtered b, c filled missing values - key is not at least in one dictionary
print (pd.DataFrame(l, columns=['c', 'b']))
    c  b
0 NaN  2
1 NaN  4

#filtered a,b, c, d filled missing values - keys are not at least in one dictionary
print (pd.DataFrame(l, columns=['c', 'd','a','b']))
    c   d  a  b
0 NaN NaN  1  2
1 NaN NaN  3  4
Run Code Online (Sandbox Code Playgroud)

因此,如果想要其他列名称,您需要重命名它们或像在第二个代码中一样设置新名称。