以块为单位循环通过Pandas Dataframe

Rad*_*duS 4 python dataframe pandas

给出以下数据帧

      open    high     low   close    volume
0     74.090  74.144  74.089  74.136  0.000012
1     74.110  74.143  74.009  74.072  0.000419
2     74.074  74.190  74.063  74.081  0.000223
3     74.100  74.244  74.085  74.182  0.000429
4     74.194  74.222  74.164  74.199  0.000090
5     74.198  74.265  74.181  74.213  0.000071
6     74.223  74.244  74.120  74.174  0.000124
7     74.181  74.229  74.132  74.161  0.000087
8     74.164  74.337  74.126  74.324  0.000299
9     74.303  74.407  74.302  74.400  0.000185
10    74.408  74.440  74.373  74.409  0.000163
11    74.437  74.438  74.399  74.418  0.000208
12    74.428  74.464  74.385  74.385  0.000231
Run Code Online (Sandbox Code Playgroud)

我如何有效地循环遍历整个数据帧并获得(在新数据帧中)前一行包括当前行的前五行?

piR*_*red 6

如果你想要效率,请使用numpy大步

import pandas as pd
import numpy as np
from numpy.lib.stride_tricks import as_strided as stride

sr, sc = v.strides
data = stride(v, (v.shape[1], v.shape[0] - 4, 5), (sc, sr, sr))

pn5 = pd.Panel(data, df.columns, df.index[4:], pd.RangeIndex(5))
df5 = pn5.to_frame()
Run Code Online (Sandbox Code Playgroud)
df5.head(10)

               open    high     low   close    volume
major minor                                          
4     0      74.090  74.144  74.089  74.136  0.000012
      1      74.110  74.143  74.009  74.072  0.000419
      2      74.074  74.190  74.063  74.081  0.000223
      3      74.100  74.244  74.085  74.182  0.000429
      4      74.194  74.222  74.164  74.199  0.000090
5     0      74.110  74.143  74.009  74.072  0.000419
      1      74.074  74.190  74.063  74.081  0.000223
      2      74.100  74.244  74.085  74.182  0.000429
      3      74.194  74.222  74.164  74.199  0.000090
      4      74.198  74.265  74.181  74.213  0.000071
Run Code Online (Sandbox Code Playgroud)

示例处理

def process(df):
    return df.loc[df.name].tail(2)

print(df5.groupby(level=0).apply(process))

               open    high     low   close    volume
major minor                                          
4     3      74.100  74.244  74.085  74.182  0.000429
      4      74.194  74.222  74.164  74.199  0.000090
5     3      74.194  74.222  74.164  74.199  0.000090
      4      74.198  74.265  74.181  74.213  0.000071
6     3      74.198  74.265  74.181  74.213  0.000071
      4      74.223  74.244  74.120  74.174  0.000124
7     3      74.223  74.244  74.120  74.174  0.000124
      4      74.181  74.229  74.132  74.161  0.000087
8     3      74.181  74.229  74.132  74.161  0.000087
      4      74.164  74.337  74.126  74.324  0.000299
9     3      74.164  74.337  74.126  74.324  0.000299
      4      74.303  74.407  74.302  74.400  0.000185
10    3      74.303  74.407  74.302  74.400  0.000185
      4      74.408  74.440  74.373  74.409  0.000163
11    3      74.408  74.440  74.373  74.409  0.000163
      4      74.437  74.438  74.399  74.418  0.000208
12    3      74.437  74.438  74.399  74.418  0.000208
      4      74.428  74.464  74.385  74.385  0.000231
Run Code Online (Sandbox Code Playgroud)

建立

df = pd.DataFrame([
        [74.09, 74.14399999999999, 74.089, 74.13600000000001, 1.2e-05],
        [74.11, 74.143, 74.009, 74.072, 0.00041900000000000005],
        [74.074, 74.19, 74.063, 74.081, 0.000223],
        [74.1, 74.244, 74.085, 74.182, 0.000429],
        [74.194, 74.222, 74.164, 74.199, 9e-05],
        [74.19800000000001, 74.265, 74.181, 74.21300000000001, 7.099999999999999e-05],
        [74.223, 74.244, 74.12, 74.17399999999999, 0.000124],
        [74.181, 74.229, 74.132, 74.161, 8.7e-05],
        [74.164, 74.337, 74.126, 74.324, 0.000299],
        [74.303, 74.407, 74.30199999999999, 74.4, 0.000185],
        [74.408, 74.44, 74.373, 74.40899999999999, 0.00016299999999999998],
        [74.437, 74.438, 74.399, 74.418, 0.00020800000000000001],
        [74.428, 74.464, 74.385, 74.385, 0.000231]
    ], columns=['open', 'high', 'low', 'close', 'volume'])
Run Code Online (Sandbox Code Playgroud)