我有一个Pandas数据帧,我想找到该数据帧中的所有唯一值...无论行/列如何.如果我有一个10 x 10的数据帧,并假设它们有84个唯一值,我需要找到它们 - 而不是计数.
我可以通过迭代数据帧的行来创建一个集合并添加每行的值.但是,我觉得它可能效率低下(不能证明这一点).有找到它的有效方法吗?有预定义的功能吗?
有一个简单的方法来获取pandas/df表:
field_1 field_2 field_3 field_4
cat 15,263 2.52 00:03:00
dog 1,652 3.71 00:03:47
test 312 3.27 00:03:41
book 300 3.46 00:02:40
Run Code Online (Sandbox Code Playgroud)
并将其转换为XML:
<item>
<field name="field_1">cat</field>
<field name="field_2">15263</field>
<field name="filed_3">2.52</field>
...
<item>
<field name="field_1">dog</field>
and so on...
Run Code Online (Sandbox Code Playgroud)
在此先感谢您的帮助.
我们有一个如下所示的DataFrame:
> df.ix[:2,:10]
0 1 2 3 4 5 6 7 8 9 10
0 NaN NaN NaN NaN 6 5 NaN NaN 4 NaN 5
1 NaN NaN NaN NaN 8 NaN NaN 7 NaN NaN 5
2 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
我们只想要DataFrame中所有唯一值的计数.一个简单的解决方案是
df.stack().value_counts()
Run Code Online (Sandbox Code Playgroud)
但是:1.看起来像是stack返回副本,而不是视图,在这种情况下,内存禁止.它是否正确?2.我想按行对DataFrame进行分组,然后为每个分组获取不同的直方图.如果我们忽略了内存问题stack并暂时使用它,那么如何正确地进行分组呢?
d = pd.DataFrame([[nan, 1, nan, 2, 3],
[nan, 1, 1, 1, 3],
[nan, 1, nan, 2, 3],
[nan,2,2,2, 3]])
len(d.stack()) #14
d.stack().groupby(arange(4))
AssertionError: …Run Code Online (Sandbox Code Playgroud) 我有一些月度数据,我试图用Pandas总结,我需要计算每月发生的唯一条目数.这是一些示例代码,显示了我正在尝试做的事情:
import pandas as pd
mnths = ['JAN','FEB','MAR','APR']
custs = ['A','B','C',]
testFrame = pd.DataFrame(index=custs, columns=mnths)
testFrame['JAN']['A'] = 'purchased Prod'
testFrame['JAN']['B'] = 'No Data'
testFrame['JAN']['C'] = 'Purchased Competitor'
testFrame['FEB']['A'] = 'purchased Prod'
testFrame['FEB']['B'] = 'purchased Prod'
testFrame['FEB']['C'] = 'purchased Prod'
testFrame['MAR']['A'] = 'No Data'
testFrame['MAR']['B'] = 'No Data'
testFrame['MAR']['C'] = 'Purchased Competitor'
testFrame['APR']['A'] = 'Purchased Competitor'
testFrame['APR']['B'] = 'purchased Prod'
testFrame['APR']['C'] = 'Purchased Competitor'
uniqueValues = pd.Series(testFrame.values.ravel()).unique()
#CODE TO GET COUNT OF ENTRIES IN testFrame BY UNIQUE VALUE
Run Code Online (Sandbox Code Playgroud)
期望的输出:
JAN FEB …Run Code Online (Sandbox Code Playgroud)