有没有办法在熊猫中获得与字符串数据和数值的相关性？

Question

有没有办法在熊猫中获得与字符串数据和数值的相关性？

我试图在大熊猫中获得相关性，这给我带来了一些困难。基本上我想回答以下问题：给定一个句子、一个值和一个数据框，哪个词与更高的值相关性最好？最坏的情况呢？

简单的例子：

Sentence      | Score
"hello there" | 100
"hello kid"   | 95
"there kid"   | 5

Run Code Online (Sandbox Code Playgroud)

我期待在这里看到“你好”这个词和分数的高相关值。希望这是有道理的——如果这在 Pandas 中是可能的，我真的很感激知道！

如果有什么不清楚的，请告诉我。

Answer 1

ili*_*eev 5

我不确定这pandas就是您要找的东西，但是是的，您可以：

import pandas as pd

df = pd.DataFrame([ ["hello there", 100],
                    ["hello kid",   95],
                    ["there kid",   5]
                  ], columns = ['Sentence','Score'])

s_corr = df.Sentence.str.get_dummies(sep=' ').corrwith(df.Score/df.Score.max())
print (s_corr)

Run Code Online (Sandbox Code Playgroud)

会还你

hello    0.998906
kid     -0.539949
there   -0.458957

Run Code Online (Sandbox Code Playgroud)

详情见pandas帮助

归档时间：	7 年，10 月前
查看次数：	8690 次
最近记录：	7 年，10 月前