Pandas - 检查相邻列的值

swi*_*fty 2 python pandas

我有一个跟踪问题状态的 df。从“打开”、“进行中”到“关闭”,如下所示:

        T1          T2           T3     T4      T5 
1      Open        In Progress Closed
2      In Progress Closed
3      Open        In Progress Open    Closed
4      Open        In Progress Closed  Open   Closed
5      Open        In Progress Closed
Run Code Online (Sandbox Code Playgroud)

基本上我想找到所有重新打开的问题。这可以通过具有Closed然后具有后续转换的值的任何行来注意到。例如,索引 4 有一个关闭的值, T3 但随后 T4 包含一些表明它已被重新打开的值。

输出将是:

        T1          T2           T3     T4      T5       Reopened
1      Open        In Progress Closed                       0
2      In Progress Closed                                   0  
3      Open        In Progress Open    Closed               0
4      Open        In Progress Closed  Open   Closed        1
5      Open        In Progress Closed                       0
Run Code Online (Sandbox Code Playgroud)

在真实的 df 中,列的范围从 T1 到 T25,有 50k 行。

所以基本上我需要检查每一列,查找是否已关闭,然后检查下一列是否不为空。

谢谢

jez*_*ael 5

我认为需要:

df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
            T1           T2      T3      T4      T5  Reopened
1         Open  In Progress  Closed     NaN     NaN         0
2  In Progress       Closed     NaN     NaN     NaN         0
3         Open  In Progress    Open  Closed     NaN         0
4         Open  In Progress  Closed    Open  Closed         1
5         Open  In Progress  Closed     NaN     NaN         0
Run Code Online (Sandbox Code Playgroud)

详情

检查Open每个值df

print ((df == 'Open'))
      T1     T2     T3     T4     T5
1   True  False  False  False  False
2  False  False  False  False  False
3   True  False   True  False  False
4   True  False  False   True  False
5   True  False  False  False  False
Run Code Online (Sandbox Code Playgroud)

使用移位数据帧检查Closed

print (df.shift(axis=1))
    T1           T2           T3      T4      T5
1  NaN         Open  In Progress  Closed     NaN
2  NaN  In Progress       Closed     NaN     NaN
3  NaN         Open  In Progress    Open  Closed
4  NaN         Open  In Progress  Closed    Open
5  NaN         Open  In Progress  Closed     NaN

print ((df.shift(axis=1)) == 'Closed')
      T1     T2     T3     T4     T5
1  False  False  False   True  False
2  False  False   True  False  False
3  False  False  False  False   True
4  False  False  False   True  False
5  False  False  False   True  False
Run Code Online (Sandbox Code Playgroud)

然后通过&to链接在一起并通过以下方式AND获得True每行至少一个any

print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')))
      T1     T2     T3     T4     T5
1  False  False  False  False  False
2  False  False  False  False  False
3  False  False  False  False  False
4  False  False  False   True  False
5  False  False  False  False  False

print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1))
1    False
2    False
3    False
4     True
5    False
dtype: bool
Run Code Online (Sandbox Code Playgroud)

最后将布尔掩码转换为整数astype并分配给新列:

df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
            T1           T2      T3      T4      T5  Reopened
1         Open  In Progress  Closed     NaN     NaN         0
2  In Progress       Closed     NaN     NaN     NaN         0
3         Open  In Progress    Open  Closed     NaN         0
4         Open  In Progress  Closed    Open  Closed         1
5         Open  In Progress  Closed     NaN     NaN         0
Run Code Online (Sandbox Code Playgroud)