SciPy PearsonR ValueError:具有多个元素的数组的真值是不明确的.使用a.any()或a.all()

Axe*_*son 2 numpy scipy pandas scikit-learn

我在使用pearsonrSciPy 的方法时遇到了一些问题.我试着让它尽可能简单(注意华丽的N ^ 2循环),但我仍然遇到了这个问题.我不完全明白我哪里出错了.我的数组正确选择,并具有相同的维度.

我运行的代码是:

from scipy import stats
from sklearn.preprocessing import LabelBinarizer, Binarizer
from sklearn.feature_extraction.text import CountVectorizer

ny_cluster = LabelBinarizer().fit_transform(ny_raw.clusterid.values)
ny_vocab = Binarizer().fit_transform(CountVectorizer().fit_transform(ny_raw.text.values))

ny_vc_phi = np.zeros((ny_vocab.shape[1], ny_cluster.shape[1]))
for i in xrange(ny_vc_phi.shape[0]):
    for j in xrange(ny_vc_phi.shape[1]):
        ny_vc_phi[i,j] = stats.pearsonr(ny_vocab[:,i].todense(), ny_cluster[:,j])[0]
Run Code Online (Sandbox Code Playgroud)

哪会产生错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/TweetClusters/TweetsLocationBayesClf/<ipython-input-29-ff1c3ac4156d> in <module>()
      3 for i in xrange(ny_vc_phi.shape[0]):
      4     for j in xrange(ny_vc_phi.shape[1]):
----> 5         ny_vc_phi[i,j] = stats.pearsonr(ny_vocab[:,i].todense(), ny_cluster[:,j])[0]
      6 

/usr/lib/python2.7/dist-packages/scipy/stats/stats.pyc in pearsonr(x, y)
   2201     # Presumably, if abs(r) > 1, then it is only some small artifact of floating

   2202     # point arithmetic.

-> 2203     r = max(min(r, 1.0), -1.0)
   2204     df = n-2
   2205     if abs(r) == 1.0:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Run Code Online (Sandbox Code Playgroud)

我真的不明白这个选择在哪里.当然,我不知道如何计算r变量也无济于事.可能是因为我搞砸了我的投入?

War*_*ser 7

检查参数是否pearsonr一维数组.也就是说,无论是ny_vocab[:,i].todense()ny_cluster[:,j]应该是1-d.尝试:

    ny_vc_phi[i,j] = stats.pearsonr(ny_vocab[:,i].todense().ravel(), ny_cluster[:,j].ravel())[0]
Run Code Online (Sandbox Code Playgroud)

(我添加了ravel()对每个参数的调用pearsonr.)