说我有下面的DataFrame,它具有0/1项,具体取决于在某个月内是否发生了什么。
Y = [0,0,1,1,0,0,0,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))
df = pd.DataFrame({'R': Y},index = X)
R
2010-01-01 0
2010-02-01 0
2010-03-01 1
2010-04-01 1
2010-05-01 0
2010-06-01 0
2010-07-01 0
2010-08-01 0
2010-09-01 1
2010-10-01 1
2010-11-01 1
Run Code Online (Sandbox Code Playgroud)
我想要创建一个第二列,该列列出直到下一次出现1为止的月数。
也就是说,我需要:
R F
2010-01-01 0 2
2010-02-01 0 1
2010-03-01 1 0
2010-04-01 1 0
2010-05-01 0 4
2010-06-01 0 3
2010-07-01 0 2
2010-08-01 0 1
2010-09-01 1 0
2010-10-01 1 0
2010-11-01 1 0
Run Code Online (Sandbox Code Playgroud)
我已经尝试过的:我还没走很远,但是我能够填补第一部分
A = list(df.index)
T = df[df['R']==1]
a = df.index[0]
b = T.index[0]
c = A.index(b) - A.index(a)
df.loc[a:b, 'F'] = np.linspace(c,0,c+1)
R F
2010-01-01 0 2.0
2010-02-01 0 1.0
2010-03-01 1 0.0
2010-04-01 1 NaN
2010-05-01 0 NaN
2010-06-01 0 NaN
2010-07-01 0 NaN
2010-08-01 0 NaN
2010-09-01 1 NaN
2010-10-01 1 NaN
2010-11-01 1 NaN
Run Code Online (Sandbox Code Playgroud)
编辑可能最好提供一个跨多年的原始示例。
Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))
df = pd.DataFrame({'R': Y},index = X)
Run Code Online (Sandbox Code Playgroud)
这是我的方式
s=df.R.cumsum()
df.loc[df.R==0,'F']=s.groupby(s).cumcount(ascending=False)+1
df.F.fillna(0,inplace=True)
df
Out[12]:
R F
2010-01-01 0 2.0
2010-02-01 0 1.0
2010-03-01 1 0.0
2010-04-01 1 0.0
2010-05-01 0 4.0
2010-06-01 0 3.0
2010-07-01 0 2.0
2010-08-01 0 1.0
2010-09-01 1 0.0
2010-10-01 1 0.0
2010-11-01 1 0.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
168 次 |
| 最近记录: |