Numpy repeat converts `nan` to `str`

Bha*_*avi -1 python numpy pandas numpy-ndarray

This numpy behavior seems a little weird.

>>> type(np.array([1, np.nan]).repeat(2)[2])
<class 'numpy.float64'>
Run Code Online (Sandbox Code Playgroud)

But when I make the first param a string

>>> type(np.array(["a", np.nan]).repeat(2)[2])
<class 'numpy.str_'>
Run Code Online (Sandbox Code Playgroud)

How do I fix it?

hpa*_*ulj 5

Maybe this way of viewing the arrays will make the difference clearer:

In the first case, np.nan is a float, so all elements are floats:

In [310]: np.array([1, np.nan]).repeat(2)                                            
Out[310]: array([ 1.,  1., nan, nan])
In [311]: _.dtype                                                                    
Out[311]: dtype('float64')
Run Code Online (Sandbox Code Playgroud)

在第二个中,有一个字符串,不能将其制成浮点型,因此整个数组的dtype是字符串-包括np.nan现在的'nan':

In [312]: np.array(["a", np.nan]).repeat(2)                                          
Out[312]: array(['a', 'a', 'nan', 'nan'], dtype='<U3')
In [313]: _.dtype                                                                    
Out[313]: dtype('<U3')
Run Code Online (Sandbox Code Playgroud)

repeat没有任何与此有关。这是np.array从列表中构造数组的方法,选择最佳的common dtype

In [321]: np.array(["a", np.nan],dtype=float)                                        
--------------------------------------------------------------------------- 
ValueError: could not convert string to float: 'a'
Run Code Online (Sandbox Code Playgroud)