我有df:
domain orgid
csyunshu.com 108299
dshu.com 108299
bbbdshu.com 108299
cwakwakmrg.com 121303
ckonkatsunet.com 121303
Run Code Online (Sandbox Code Playgroud)
我想添加一个新列,用每个orgid替换域列和数字ID:
domain orgid domainid
csyunshu.com 108299 1
dshu.com 108299 2
bbbdshu.com 108299 3
cwakwakmrg.com 121303 1
ckonkatsunet.com 121303 2
Run Code Online (Sandbox Code Playgroud)
我已经尝试过这一行,但它没有给出我想要的结果:
df.groupby('orgid').count['domain'].reset_index()
Run Code Online (Sandbox Code Playgroud)
有人可以帮忙吗?
假设我有一个在不同键上发生的事件列表.
data = [
{"key": "A", "event": "created"},
{"key": "A", "event": "updated"},
{"key": "A", "event": "updated"},
{"key": "A", "event": "updated"},
{"key": "B", "event": "created"},
{"key": "B", "event": "updated"},
{"key": "B", "event": "updated"},
{"key": "C", "event": "created"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
]
df = pandas.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)
我想首先在键上索引我的DataFrame,然后是枚举.它看起来像一个简单的unstack操作,但我无法找到如何正确地执行它.
我能做的最好的是
df.set_index("key", append=True).swaplevel(0, 1)
event
key
A 0 created
1 updated
2 updated
3 updated
B 4 created
5 …Run Code Online (Sandbox Code Playgroud) 我有一个df这样的数据框,但更大。
ID_0 ID_1 location
0 a b 1
1 a c 1
2 a b 0
3 d c 0
4 a c 0
5 a c 1
Run Code Online (Sandbox Code Playgroud)
我想添加一列来标识前两个。例如:
ID_0 ID_1 location group_ID
0 a b 1 0
1 a c 1 1
2 a b 0 0
3 d c 0 2
4 a c 0 1
5 a c 1 1
Run Code Online (Sandbox Code Playgroud)
此新列来自映射“ ab”到0,“ ac”到1和“ dc”到2。
我认为第一步是
grouped = df.groupby(['ID_0', 'ID_1'])
Run Code Online (Sandbox Code Playgroud)
但我不确定从那里去哪里。
您如何在熊猫中创建这个新专栏?