根据潜在的开始和结束布尔列在时间序列数据中创建组(矢量化解决方案)

Xau*_*ume 8 python vectorization pandas

我的数据框结构如下:

   group  maybe_start  maybe_end
0    ABC        False      False
1    ABC         True      False
2    ABC        False      False
3    ABC        False      False
4    ABC         True      False
5    ABC        False      False
6    ABC        False       True
7    ABC        False      False
8    DEF        False      False
9    DEF        False      False
10   DEF         True      False
11   DEF        False      False
12   DEF        False       True
13   DEF        False      False
14   DEF        False      False
15   DEF        False       True
16   DEF         True      False
17   DEF        False      False
18   DEF        False       True
Run Code Online (Sandbox Code Playgroud)

我需要创建一个单独的列,比方说group2,它将记录由开始和结束时刻定义的组。因此,每当列中前一个之后group2出现第一个 True 值时,每个组都应该开始,并在开始后第一次出现时结束。换句话说,我们在at中开始一个新值(在本例中为第 1 行),接下来的每一行将获得相同的值,直到出现(此处为第 6 行)。所有这些都需要在 groupby 中完成,其中基于列创建组。因此,预期输出应如下所示:maybe_startmaybe_end==Truemaybe_end==Truegroup2maybe_start==Truegroup2maybe_end==Truegroup

   group  maybe_start  maybe_end  group2
0    ABC        False      False     NaN
1    ABC         True      False     1.0
2    ABC        False      False     1.0
3    ABC        False      False     1.0
4    ABC         True      False     1.0
5    ABC        False      False     1.0
6    ABC        False       True     1.0
7    ABC        False      False     NaN
0    DEF        False      False     NaN
1    DEF        False      False     NaN
2    DEF         True      False     1.0
3    DEF        False      False     1.0
4    DEF        False       True     1.0
5    DEF        False      False     NaN
6    DEF        False      False     NaN
7    DEF        False       True     NaN
8    DEF         True      False     2.0
9    DEF        False      False     2.0
10   DEF        False       True     2.0 
Run Code Online (Sandbox Code Playgroud)

我怎样才能在 Pandas 中以矢量化的方式实现这一点?

And*_*ely 1

你可以试试:

def fn(x):
    out, g, state = [], 1, False
    for start, end in zip(x.maybe_start, x.maybe_end):
        if not state and start:
            out.append(g)
            state = True
        elif state and end:
            out.append(g)
            state = False
            g += 1
        elif state:
            out.append(g)
        else:
            out.append(np.nan)

    x['group2'] = out
    return x


out = df.groupby('group', group_keys=False).apply(fn)
print(out)
Run Code Online (Sandbox Code Playgroud)

印刷:

   group  maybe_start  maybe_end  group2
0    ABC        False      False     NaN
1    ABC         True      False     1.0
2    ABC        False      False     1.0
3    ABC        False      False     1.0
4    ABC         True      False     1.0
5    ABC        False      False     1.0
6    ABC        False       True     1.0
7    ABC        False      False     NaN
8    DEF        False      False     NaN
9    DEF        False      False     NaN
10   DEF         True      False     1.0
11   DEF        False      False     1.0
12   DEF        False       True     1.0
13   DEF        False      False     NaN
14   DEF        False      False     NaN
15   DEF        False       True     NaN
16   DEF         True      False     2.0
17   DEF        False      False     2.0
18   DEF        False       True     2.0
Run Code Online (Sandbox Code Playgroud)