men*_*h84 3 python dataframe pandas
我有一个txn_df包含货币交易记录的数据框(称之为),这里是这个问题中的重要列:
txn_year txn_month custid withdraw deposit
2011 4 123 0.0 100.0
2011 5 123 0.0 0.0
2011 6 123 0.0 0.0
2011 7 123 50.1 0.0
2011 8 123 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
还假设我们在这里有多个客户。withdraw和deposit0.0 值意味着没有交易发生。我想要做的是生成一个新列,指示自发生交易以来已经发生了多少个月。类似的东西:
txn_year txn_month custid withdraw deposit num_months_since_last_txn
2011 4 123 0.0 100.0 0
2011 5 123 0.0 0.0 1
2011 6 123 0.0 0.0 2
2011 7 123 50.1 0.0 3
2011 8 123 0.0 0.0 1
Run Code Online (Sandbox Code Playgroud)
唯一的解决方法,到目前为止我能想到的是,以产生新的列has_txn(其是1/0或真/假)时的任一个withdraw和deposit具有值> 0.0,但我不能从那里继续。
解决这个问题的一种方法,
df['series'] = df[['withdraw','deposit']].ne(0).sum(axis=1)
m = df['series']>=1
Run Code Online (Sandbox Code Playgroud)
正如@Chris A 评论的那样,
m = df[['withdraw','deposit']].gt(0).any(axis=1) #replacement for above snippet,
df['num_months_since_last_txn'] = df.groupby(m.cumsum()).cumcount()
df.loc[df['num_months_since_last_txn']==0,'num_months_since_last_txn']=(df['num_months_since_last_txn']+1).shift(1).fillna(0)
print df
Run Code Online (Sandbox Code Playgroud)
输出:
txn_year txn_month custid withdraw deposit
0 2011 4 123 0.0 100.0
1 2011 5 123 0.0 0.0
2 2011 6 123 0.0 0.0
3 2011 7 123 50.1 0.0
4 2011 8 123 0.0 0.0
txn_year txn_month custid withdraw deposit num_months_since_last_txn
0 2011 4 123 0.0 100.0 0.0
1 2011 5 123 0.0 0.0 1.0
2 2011 6 123 0.0 0.0 2.0
3 2011 7 123 50.1 0.0 3.0
4 2011 8 123 0.0 0.0 1.0
Run Code Online (Sandbox Code Playgroud)
解释:
ne和求和以获取二进制值。groupby, cumsum,来创建从 0,1,2...n 开始的系列cumcount。0使用.loc注意:可能是我添加了更复杂的来解决这个问题。但它会给你一个想法和方法来解决这个问题。
考虑客户 ID 的解决方案,
df=df.sort_values(by=['custid','txn_month'])
mask=~df.duplicated(subset=['custid'],keep='first')
m = df[['withdraw','deposit']].gt(0).any(axis=1)
df['num_months_since_last_txn'] = df.groupby(m.cumsum()).cumcount()
df.loc[df['num_months_since_last_txn']==0,'num_months_since_last_txn']=(df['num_months_since_last_txn']+1).shift(1)
df.loc[mask,'num_months_since_last_txn']=0
Run Code Online (Sandbox Code Playgroud)
样本输入:
txn_year txn_month custid withdraw deposit
0 2011 4 123 0.0 100.0
1 2011 5 123 0.0 0.0
2 2011 4 1245 0.0 100.0
3 2011 5 1245 0.0 0.0
4 2011 6 123 0.0 0.0
5 2011 7 1245 50.1 0.0
6 2011 7 123 50.1 0.0
7 2011 8 123 0.0 0.0
8 2011 6 1245 0.0 0.0
9 2011 8 1245 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
示例输出:
txn_year txn_month custid withdraw deposit num_months_since_last_txn
0 2011 4 123 0.0 100.0 0.0
1 2011 5 123 0.0 0.0 1.0
4 2011 6 123 0.0 0.0 2.0
6 2011 7 123 50.1 0.0 3.0
7 2011 8 123 0.0 0.0 1.0
2 2011 4 1245 0.0 100.0 0.0
3 2011 5 1245 0.0 0.0 1.0
8 2011 6 1245 0.0 0.0 2.0
5 2011 7 1245 50.1 0.0 3.0
9 2011 8 1245 0.0 0.0 1.0
Run Code Online (Sandbox Code Playgroud)
考虑客户 ID 的说明,
| 归档时间: |
|
| 查看次数: |
434 次 |
| 最近记录: |