Dro*_*ror 5 python dataframe pandas
我从这个数据框开始:
df = pd.DataFrame(
[
["a", "aa", "2020-12-20", 10],
["a", "ab", "2020-12-26", 11],
["a", "aa", "2020-12-22", 10],
["b", "bb", "2020-12-25", 111],
["c", "bb", "2020-12-20", 20],
["d", "dd", "2020-12-05", 1111]
],
columns=["cat", "user", "date", "value"]
)
df["date"] = pd.to_datetime(df.date)
Run Code Online (Sandbox Code Playgroud)
猫 | 用户 | 日期 | 价值 | |
---|---|---|---|---|
0 | 一种 | aa | 2020-12-20 00:00:00 | 10 |
1 | 一种 | AB | 2020-12-26 00:00:00 | 11 |
2 | 一种 | aa | 2020-12-22 00:00:00 | 10 |
3 | 乙 | bb | 2020-12-25 00:00:00 | 111 |
4 | C | bb | 2020-12-20 00:00:00 | 20 |
5 | d | 日 | 2020-12-05 00:00:00 | 1111 |
接下来,我正在运行以下聚合:
gb = (
df.set_index("date")
.groupby("cat")
.resample("W")
.agg(
{"value": "sum", "user": ["nunique", lambda x: x.unique()]}
)
.rename({"<lambda>": "unqiue_users"}, axis=1)
)
Run Code Online (Sandbox Code Playgroud)
这会生成一个在列中具有多索引的表:
value user
sum nunique unqiue_users
cat date
a 2020-12-20 10 1 aa
2020-12-27 21 2 [aa, ab]
b 2020-12-27 111 1 bb
c 2020-12-20 20 1 bb
d 2020-12-06 1111 1 dd
Run Code Online (Sandbox Code Playgroud)
最后,我正在尝试对最后一个结果进行聚合,例如:
gb.groupby(level=0)[["value", "sum"]].mean()
Run Code Online (Sandbox Code Playgroud)
我不知道如何“访问”具有多索引的列。任何的想法?
对于选择 MultiIndex 和使用的元组,这里使用了一个元素列表:
print (gb.groupby(level=0)[[("value", "sum")]].mean())
value
sum
cat
a 15.5
b 111.0
c 20.0
d 1111.0
Run Code Online (Sandbox Code Playgroud)
或者您可以使用mean
每个级别的简化解决方案:
print (gb[[("value", "sum")]].mean(level=0))
value
sum
cat
a 15.5
b 111.0
c 20.0
d 1111.0
Run Code Online (Sandbox Code Playgroud)
对于Series
选择省略嵌套列表:
print (gb[("value", "sum")].mean(level=0))
cat
a 15.5
b 111.0
c 20.0
d 1111.0
Name: (value, sum), dtype: float64
Run Code Online (Sandbox Code Playgroud)
您的解决方案应该更改以避免MultiIndex
在列中:
gb = (
df.set_index("date")
.groupby(["cat", pd.Grouper(freq='W')])
.agg(val = ("value", "sum"),
nuniq = ("user", "nunique"),
unqiue_users = ("user", lambda x: x.unique()))
)
print (gb)
val nuniq unqiue_users
cat date
a 2020-12-20 10 1 aa
2020-12-27 21 2 [ab, aa]
b 2020-12-27 111 1 bb
c 2020-12-20 20 1 bb
d 2020-12-06 1111 1 dd
print (gb['val'].mean(level=0))
cat
a 15.5
b 111.0
c 20.0
d 1111.0
Name: val, dtype: float64
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
48 次 |
最近记录: |