Pandas DataFrame，1、2、3 和 NaN 值的默认数据类型

Question

Pandas DataFrame，1、2、3 和 NaN 值的默认数据类型

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df ['one']

Run Code Online (Sandbox Code Playgroud)

输出：

    a    1.0

    b    2.0

    c    3.0

    d    NaN

Name: one, dtype: float64

Run Code Online (Sandbox Code Playgroud)

该值设置为浮点数

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}

df = pd.DataFrame(d)
print df ['one']

Run Code Online (Sandbox Code Playgroud)

输出：

a    1

b    2

c    3

Name: one, dtype: int64

Run Code Online (Sandbox Code Playgroud)

但现在该值设置为int64.

区别在于第一个，NaN值中有一个。

上面例子中数据类型的设置背后的规则是什么？

谢谢！

Answer 1

raf*_*elc 5

NaNis 的类型float，因此熊猫也会推断出所有ints数字floats。

这可以很容易地检查：

>>> type(np.nan) 
float

Run Code Online (Sandbox Code Playgroud)

我会推荐这个有趣的阅读

Answer 2

Joe*_*ant 3

pandas继承了 numpy的许多错误决定。

参考：

Pandas 陷阱 - 整数 NA

Numpy 或 Pandas，将数组类型保持为整数，同时具有 nan 值

如果你看一下type(df.iloc[3,0])，你可以看到nanis of type numpy.float64，它强制整个列的类型强制为浮动。基本上，Pandas 对于处理可为 null 的整数来说是垃圾，您只需将它们作为浮点数处理即可。如果性能不是问题的话，您还可以使用对象类型来保存整数。

归档时间：	7 年，4 月前
查看次数：	2750 次
最近记录：	7 年，4 月前