Muh*_*han 2 python dataframe pandas
考虑以下熊猫数据帧“ df”和python列表“ my_list”。
df =
timestamp address type
1 1 A
2 9 B
3 3 A
4 6 B
5 6 B
6 2 B
7 3 A
8 2 B
9 1 B
10 3 A
11 3 A
12 3 A
Run Code Online (Sandbox Code Playgroud)
my_list =
[1, 2, 3]
Run Code Online (Sandbox Code Playgroud)
现在,我想要的是将时间戳帧中的数据帧分组在3秒的容器中,并且仅当“ my_list”中存在地址时才对唯一的“类型”进行计数。
预期的输出应如下所示:
timestamp A B
1 2 0 #One "B" ignored, because address=9 is not in my_list
4 0 1 #Two "B" ignored because address is not in "my_list
7 1 2 #Two "B" with unique addresses, and one "A"
10 1 0 #Three rows with Type="A", but addresses are is same.
Run Code Online (Sandbox Code Playgroud)
请注意,时间戳记值最初是时间戳记格式的,我们可以将df.groupby和pd.TimeGrouper函数应用于3秒列中的行分组。
仅欣赏基于Pandas(Python)的答案。
如有任何混淆,我们深表歉意。我试图保持简单。
-可汗
使用:
#convert index to triples
df.index = df.index // 3
#filter rows by condition
df1 = df[df['address'].isin(my_list)]
#get unique numbers and reshape
df1 = df1['address'].groupby([df1.index, df1['type']]).nunique().unstack(fill_value=0)
#add timestamps
df1.index = df['timestamp'].groupby(df.index).first()
print (df1)
type A B
timestamp
1 2 0
4 0 1
7 1 2
10 1 0
Run Code Online (Sandbox Code Playgroud)
设定:
print (df)
timestamp address type
0 1 1 A
1 2 9 B
2 3 3 A
3 4 6 B
4 5 6 B
5 6 2 B
6 7 3 A
7 8 2 B
8 9 1 B
9 10 3 A
10 11 3 A
11 12 3 A
Run Code Online (Sandbox Code Playgroud)
解决方案datetimes更简单:
#sample datetimes
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='D',
origin=pd.Timestamp('2017-01-01'))
print (df)
timestamp address type
0 2017-01-02 1 A
1 2017-01-03 9 B
2 2017-01-04 3 A
3 2017-01-05 6 B
4 2017-01-06 6 B
5 2017-01-07 2 B
6 2017-01-08 3 A
7 2017-01-09 2 B
8 2017-01-10 1 B
9 2017-01-11 3 A
10 2017-01-12 3 A
11 2017-01-13 3 A
df1 = df[df['address'].isin(my_list)]
df1 = (df1.groupby([pd.Grouper(freq='3D', key='timestamp'), 'type'])['address']
.nunique()
.unstack(fill_value=0) )
print (df1)
type A B
timestamp
2017-01-02 2 0
2017-01-05 0 1
2017-01-08 1 2
2017-01-11 1 0
Run Code Online (Sandbox Code Playgroud)
一排解决方案:
df1 = (df.query("address in @my_list")
.groupby([pd.Grouper(freq='3D', key='timestamp'), 'type'])['address']
.nunique()
.unstack(fill_value=0))
print (df1)
type A B
timestamp
2017-01-02 2 0
2017-01-05 0 1
2017-01-08 1 2
2017-01-11 1 0
Run Code Online (Sandbox Code Playgroud)