假设我们有以下pandas DataFrame:
In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame([0, 1, 0, 0, 1, 1, 0, 1, 1, 1], columns=['in'])
df
Out[1]:
in
0 0
1 1
2 0
3 0
4 1
5 1
6 0
7 1
8 1
9 1
Run Code Online (Sandbox Code Playgroud)
如何在熊猫中以矢量化方式计算连续数?我希望得到这样的结果:
in out
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 0 0
7 1 1
8 1 2 …Run Code Online (Sandbox Code Playgroud) 我有一个pandas DataFrame,时间作为索引(1分钟Freq)和几列数据.有时数据包含NaN.如果是这样,我只想在间隙不超过5分钟的情况下进行插值.在这种情况下,这将是最多5个连续的NaN.数据可能如下所示(几个测试用例,显示了问题):
import numpy as np
import pandas as pd
from datetime import datetime
start = datetime(2014,2,21,14,50)
data = pd.DataFrame(index=[start + timedelta(minutes=1*x) for x in range(0, 8)],
data={'a': [123.5, np.NaN, 136.3, 164.3, 213.0, 164.3, 213.0, 221.1],
'b': [433.5, 523.2, 536.3, 464.3, 413.0, 164.3, 213.0, 221.1],
'c': [123.5, 132.3, 136.3, 164.3] + [np.NaN]*4,
'd': [np.NaN]*8,
'e': [np.NaN]*7 + [2330.3],
'f': [np.NaN]*4 + [2763.0, 2142.3, 2127.3, 2330.3],
'g': [2330.3] + [np.NaN]*7,
'h': [2330.3] + [np.NaN]*6 + [2777.7]})
Run Code Online (Sandbox Code Playgroud)
它看起来像这样:
In [147]: data
Out[147]: …Run Code Online (Sandbox Code Playgroud)