Dro*_*ror 5 python data-analysis business-intelligence pandas
假设有以下内容DataFrame
:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
np.random.seed(10)
n = 10
df = pd.DataFrame(
{
"datetime": np.random.choice(rng,n),
"cat": np.random.choice(['a','b','b'], n),
"val": np.random.randint(0,5, size=n)
}
)
Run Code Online (Sandbox Code Playgroud)
如果我现在groupby
:
gb = df.groupby(['cat','datetime']).sum()
Run Code Online (Sandbox Code Playgroud)
我得到cat
每小时的总数:
cat datetime val
a 2011-01-01 00:00:00 1
2011-01-01 09:00:00 3
2011-01-02 16:00:00 1
2011-01-03 16:00:00 1
b 2011-01-01 08:00:00 4
2011-01-01 15:00:00 3
2011-01-01 16:00:00 3
2011-01-02 04:00:00 4
2011-01-02 05:00:00 1
2011-01-02 12:00:00 4
Run Code Online (Sandbox Code Playgroud)
但是,我希望有类似的东西:
cat datetime val
a 2011-01-01 4
2011-01-02 1
2011-01-03 1
b 2011-01-01 10
2011-01-02 9
Run Code Online (Sandbox Code Playgroud)
我可以通过添加另一个列来获得所需的结果date
:
df['date'] = df.datetime.apply(pd.datetime.date)
Run Code Online (Sandbox Code Playgroud)
然后做一个类似的groupby
:df.groupby(['cat','date']).sum()
.但我感兴趣的是,有更多的pythonic方式吗?另外,我可能想看看月份或年级.那么,什么是正确的方法?
您可以尝试set_index
然后groupby
通过cat
和date
:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=72, freq='H')
np.random.seed(10)
n = 10
df = pd.DataFrame(
{
"datetime": np.random.choice(rng,n),
"cat": np.random.choice(['a','b','b'], n),
"val": np.random.randint(0,5, size=n)
}
)
print df
cat datetime val
0 a 2011-01-01 09:00:00 3
1 b 2011-01-01 15:00:00 3
2 a 2011-01-03 16:00:00 1
3 b 2011-01-02 04:00:00 4
4 b 2011-01-02 05:00:00 1
5 b 2011-01-01 08:00:00 4
6 a 2011-01-01 00:00:00 1
7 a 2011-01-02 16:00:00 1
8 b 2011-01-02 12:00:00 4
9 b 2011-01-01 16:00:00 3
Run Code Online (Sandbox Code Playgroud)
df = df.set_index('datetime')
gb = df.groupby(['cat', lambda x: x.date]).sum()
print gb
val
cat
a 2011-01-01 4
2011-01-02 1
2011-01-03 1
b 2011-01-01 10
2011-01-02 9
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3525 次 |
最近记录: |