我有一个跟踪问题状态的 df。从“打开”、“进行中”到“关闭”,如下所示:
T1 T2 T3 T4 T5
1 Open In Progress Closed
2 In Progress Closed
3 Open In Progress Open Closed
4 Open In Progress Closed Open Closed
5 Open In Progress Closed
Run Code Online (Sandbox Code Playgroud)
基本上我想找到所有重新打开的问题。这可以通过具有Closed
然后具有后续转换的值的任何行来注意到。例如,索引 4
有一个关闭的值, T3
但随后 T4
包含一些表明它已被重新打开的值。
输出将是:
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed 0
2 In Progress Closed 0
3 Open In Progress Open Closed 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed 0
Run Code Online (Sandbox Code Playgroud)
在真实的 df 中,列的范围从 T1 到 T25,有 50k 行。
所以基本上我需要检查每一列,查找是否已关闭,然后检查下一列是否不为空。
谢谢
我认为需要:
df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0
Run Code Online (Sandbox Code Playgroud)
详情:
检查Open
每个值df
:
print ((df == 'Open'))
T1 T2 T3 T4 T5
1 True False False False False
2 False False False False False
3 True False True False False
4 True False False True False
5 True False False False False
Run Code Online (Sandbox Code Playgroud)
使用移位数据帧检查Closed
:
print (df.shift(axis=1))
T1 T2 T3 T4 T5
1 NaN Open In Progress Closed NaN
2 NaN In Progress Closed NaN NaN
3 NaN Open In Progress Open Closed
4 NaN Open In Progress Closed Open
5 NaN Open In Progress Closed NaN
print ((df.shift(axis=1)) == 'Closed')
T1 T2 T3 T4 T5
1 False False False True False
2 False False True False False
3 False False False False True
4 False False False True False
5 False False False True False
Run Code Online (Sandbox Code Playgroud)
然后通过&
to链接在一起并通过以下方式AND
获得True
每行至少一个any
:
print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')))
T1 T2 T3 T4 T5
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False True False
5 False False False False False
print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1))
1 False
2 False
3 False
4 True
5 False
dtype: bool
Run Code Online (Sandbox Code Playgroud)
最后将布尔掩码转换为整数astype
并分配给新列:
df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
752 次 |
最近记录: |