Use*_*YmY 5 python indexing group-by pandas
我有df:
domain orgid
csyunshu.com 108299
dshu.com 108299
bbbdshu.com 108299
cwakwakmrg.com 121303
ckonkatsunet.com 121303
Run Code Online (Sandbox Code Playgroud)
我想添加一个新列,用每个orgid替换域列和数字ID:
domain orgid domainid
csyunshu.com 108299 1
dshu.com 108299 2
bbbdshu.com 108299 3
cwakwakmrg.com 121303 1
ckonkatsunet.com 121303 2
Run Code Online (Sandbox Code Playgroud)
我已经尝试过这一行,但它没有给出我想要的结果:
df.groupby('orgid').count['domain'].reset_index()
Run Code Online (Sandbox Code Playgroud)
有人可以帮忙吗?
你可以调用rank的groupby对象,并通过PARAM method='first':
In [61]:
df['domainId'] = df.groupby('orgid')['orgid'].rank(method='first')
df
Out[61]:
domain orgid domainId
0 csyunshu.com 108299 1
1 dshu.com 108299 2
2 bbbdshu.com 108299 3
3 cwakwakmrg.com 121303 1
4 ckonkatsunet.com 121303 2
Run Code Online (Sandbox Code Playgroud)
如果要覆盖列,可以执行以下操作:
df['domain'] = df.groupby('orgid')['orgid'].rank(method='first')
Run Code Online (Sandbox Code Playgroud)