DataFrame相关产生NaN,尽管它的值都是整数

use*_*975 6 python nan series correlation pandas

我有一个数据帧df:

df   = pandas.DataFrame(pd.read_csv(loggerfile, header = 2))

values = df.as_matrix()

df2 = pd.DataFrame.from_records(values, index = datetimeIdx, columns = Columns) 
Run Code Online (Sandbox Code Playgroud)

编辑:

现在按照建议的方式读取数据:

df2 = pd.read_csv(loggerfile, header = None, skiprows = [0,1,2])
Run Code Online (Sandbox Code Playgroud)

样品:

                         0              1       2   3   4   5   6   7   8   \
0  2014-03-19T12:44:32.695Z  1395233072695  703425   0   2   1  13   5  21   
1  2014-03-19T12:44:32.727Z  1395233072727  703425   0   2   1  13   5  21   

   9   10  11   12  13   14  15  16  
0  25   0  25  209   0  145   0   0  
1  25   0  25  209   0  146   0   0
Run Code Online (Sandbox Code Playgroud)

列都是int类型(第一个除外):

print df2.dtypes

0     object
1      int64
2      int64
3      int64
4      int64
5      int64
6      int64
7      int64
8      int64
9      int64
10     int64
11     int64
12     int64
13     int64
14     int64
15     int64
16     int64
Run Code Online (Sandbox Code Playgroud)

但在我的相关性中,有些列似乎是NaN.

df2.corr()

     1          2    3          4           5   6   7            8           ...    
1    1.000000   NaN  0.018752   -0.550307   NaN NaN 0.075191     0.775725
2    NaN        NaN  NaN         NaN        NaN NaN NaN          NaN
3    0.018752   NaN  1.000000   -0.067293   NaN NaN -0.579651    0.004593 
...
Run Code Online (Sandbox Code Playgroud)

Kar*_* D. 14

这些专栏现在的价值没有变化,是的

因为,Joris指出,NaN如果数值不变,你会预期.要了解为什么要看一下相关公式:

cor(i,j) = cov(i,j)/[stdev(i)*stdev(j)]
Run Code Online (Sandbox Code Playgroud)

如果第i个或第j个变量的值不变,则相应的标准偏差将为零,因此分数的分母也将如此.因此,相关性将是NaN.