按熊猫中唯一的一对列计算

bar*_*bug 30 python pandas

我试图找出如何根据每对唯一列(ip,useragent)的行数来计算,例如

d = pd.DataFrame({'ip': ['192.168.0.1', '192.168.0.1', '192.168.0.1', '192.168.0.2'], 'useragent': ['a', 'a', 'b', 'b']})

     ip              useragent
0    192.168.0.1     a
1    192.168.0.1     a
2    192.168.0.1     b
3    192.168.0.2     b
Run Code Online (Sandbox Code Playgroud)

生产:

ip           useragent  
192.168.0.1  a           2
192.168.0.1  b           1
192.168.0.2  b           1
Run Code Online (Sandbox Code Playgroud)

想法?

Mat*_*ohn 49

如果你使用groupby,你会得到你想要的.

d.groupby(['ip', 'useragent']).size()
Run Code Online (Sandbox Code Playgroud)

生产:

ip          useragent               
192.168.0.1 a           2
            b           1
192.168.0.2 b           1
Run Code Online (Sandbox Code Playgroud)

  • 得到它:`d.groupby(['ip','useragent']).size()`做到了:) (9认同)
  • 对我来说,这只是给出了'AttributeError:'DataFrame'对象没有属性'size'. (2认同)

Mar*_*hke 7

print(d.groupby(['ip', 'useragent']).size().reset_index().rename(columns={0:''}))
Run Code Online (Sandbox Code Playgroud)

给出:

            ip useragent   
0  192.168.0.1         a  2
1  192.168.0.1         b  1
2  192.168.0.2         b  1
Run Code Online (Sandbox Code Playgroud)

另一个不错的选择可能是pandas.crosstab

print(pd.crosstab(d.ip, d.useragent) )
print('\nsome cosmetics:')
print(pd.crosstab(d.ip, d.useragent).reset_index().rename_axis('',axis='columns') )
Run Code Online (Sandbox Code Playgroud)

给出:

useragent    a  b
ip               
192.168.0.1  2  1
192.168.0.2  0  1

some cosmetics:
            ip  a  b
0  192.168.0.1  2  1
1  192.168.0.2  0  1
Run Code Online (Sandbox Code Playgroud)