La-*_*-Lo 2 python indexing numpy dataframe pandas
假设我有一个像这样的数据框:
Time A B C D
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 NaN NaN 12087.71 12087.91
2019-06-17 08:47:00 NaN 12088.21 12084.21 12085.21
2019-06-17 08:48:00 NaN 12090.21 NaN NaN
2019-06-17 08:49:00 NaN 12090.21 NaN NaN
2019-06-17 08:50:00 NaN NaN 12504.11 NaN
2019-06-17 08:51:00 NaN NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:53:00 12503.61 12503.61 12503.61 12503.61
2019-06-17 08:54:00 12503.61 12503.61 12503.11 12503.11
Run Code Online (Sandbox Code Playgroud)
如何求整个df中最长的不间断NaN序列的长度?(在示例中为 6)有效吗?
编辑:忘记强调“有效”这个词,因为 df 大约有 1mio 行长
让我们尝试apply一个用户定义的函数,该函数依次用于cumsum()识别块:
def max_na(s):
isna = s.isna()
blocks = (~isna).cumsum()
return isna.groupby(blocks).sum().max()
df.apply(max_na).max()
# 6.0
Run Code Online (Sandbox Code Playgroud)