eme*_*hex 7 python r python-3.x pandas tidyverse
我的数据看起来像这样:
library("tidyverse")
df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1)
df
# user x y
# 1 1 a 1
# 2 1 b 1
# 3 2 a 1
# 4 3 a 1
# 5 3 c 1
# 6 3 d 1
Run Code Online (Sandbox Code Playgroud)
Python格式:
import pandas as pd
df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1})
Run Code Online (Sandbox Code Playgroud)
我想"完成"数据框,以便每个user
都有一个记录,每个可能x
的默认y
填充设置为0.
这在R(tidyverse/tidyr)中有点微不足道:
df %>%
complete(nesting(user), x = c("a", "b", "c", "d"), fill = list(y = 0))
# user x y
# 1 1 a 1
# 2 1 b 1
# 3 1 c 0
# 4 1 d 0
# 5 2 a 1
# 6 2 b 0
# 7 2 c 0
# 8 2 d 0
# 9 3 a 1
# 10 3 b 0
# 11 3 c 1
# 12 3 d 1
Run Code Online (Sandbox Code Playgroud)
complete
pandas/python中是否有一个等效的结果?
您可以使用reindex
通过MultiIndex.from_product
:
df = df.set_index(['user','x'])
mux = pd.MultiIndex.from_product([df.index.levels[0], df.index.levels[1]],names=['user','x'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
user x y
0 1 a 1
1 1 b 1
2 1 c 0
3 1 d 0
4 2 a 1
5 2 b 0
6 2 c 0
7 2 d 0
8 3 a 1
9 3 b 0
10 3 c 1
11 3 d 1
Run Code Online (Sandbox Code Playgroud)
df = df.set_index(['user','x'])['y'].unstack(fill_value=0).stack().reset_index(name='y')
print (df)
user x y
0 1 a 1
1 1 b 1
2 1 c 0
3 1 d 0
4 2 a 1
5 2 b 0
6 2 c 0
7 2 d 0
8 3 a 1
9 3 b 0
10 3 c 1
11 3 d 1
Run Code Online (Sandbox Code Playgroud)