Rad*_*duS 4 python dataframe pandas
给出以下数据帧
open high low close volume
0 74.090 74.144 74.089 74.136 0.000012
1 74.110 74.143 74.009 74.072 0.000419
2 74.074 74.190 74.063 74.081 0.000223
3 74.100 74.244 74.085 74.182 0.000429
4 74.194 74.222 74.164 74.199 0.000090
5 74.198 74.265 74.181 74.213 0.000071
6 74.223 74.244 74.120 74.174 0.000124
7 74.181 74.229 74.132 74.161 0.000087
8 74.164 74.337 74.126 74.324 0.000299
9 74.303 74.407 74.302 74.400 0.000185
10 74.408 74.440 74.373 74.409 0.000163
11 74.437 74.438 74.399 74.418 0.000208
12 74.428 74.464 74.385 74.385 0.000231
Run Code Online (Sandbox Code Playgroud)
我如何有效地循环遍历整个数据帧并获得(在新数据帧中)前一行包括当前行的前五行?
如果你想要效率,请使用numpy
大步
import pandas as pd
import numpy as np
from numpy.lib.stride_tricks import as_strided as stride
sr, sc = v.strides
data = stride(v, (v.shape[1], v.shape[0] - 4, 5), (sc, sr, sr))
pn5 = pd.Panel(data, df.columns, df.index[4:], pd.RangeIndex(5))
df5 = pn5.to_frame()
Run Code Online (Sandbox Code Playgroud)
df5.head(10)
open high low close volume
major minor
4 0 74.090 74.144 74.089 74.136 0.000012
1 74.110 74.143 74.009 74.072 0.000419
2 74.074 74.190 74.063 74.081 0.000223
3 74.100 74.244 74.085 74.182 0.000429
4 74.194 74.222 74.164 74.199 0.000090
5 0 74.110 74.143 74.009 74.072 0.000419
1 74.074 74.190 74.063 74.081 0.000223
2 74.100 74.244 74.085 74.182 0.000429
3 74.194 74.222 74.164 74.199 0.000090
4 74.198 74.265 74.181 74.213 0.000071
Run Code Online (Sandbox Code Playgroud)
示例处理
def process(df):
return df.loc[df.name].tail(2)
print(df5.groupby(level=0).apply(process))
open high low close volume
major minor
4 3 74.100 74.244 74.085 74.182 0.000429
4 74.194 74.222 74.164 74.199 0.000090
5 3 74.194 74.222 74.164 74.199 0.000090
4 74.198 74.265 74.181 74.213 0.000071
6 3 74.198 74.265 74.181 74.213 0.000071
4 74.223 74.244 74.120 74.174 0.000124
7 3 74.223 74.244 74.120 74.174 0.000124
4 74.181 74.229 74.132 74.161 0.000087
8 3 74.181 74.229 74.132 74.161 0.000087
4 74.164 74.337 74.126 74.324 0.000299
9 3 74.164 74.337 74.126 74.324 0.000299
4 74.303 74.407 74.302 74.400 0.000185
10 3 74.303 74.407 74.302 74.400 0.000185
4 74.408 74.440 74.373 74.409 0.000163
11 3 74.408 74.440 74.373 74.409 0.000163
4 74.437 74.438 74.399 74.418 0.000208
12 3 74.437 74.438 74.399 74.418 0.000208
4 74.428 74.464 74.385 74.385 0.000231
Run Code Online (Sandbox Code Playgroud)
建立
df = pd.DataFrame([
[74.09, 74.14399999999999, 74.089, 74.13600000000001, 1.2e-05],
[74.11, 74.143, 74.009, 74.072, 0.00041900000000000005],
[74.074, 74.19, 74.063, 74.081, 0.000223],
[74.1, 74.244, 74.085, 74.182, 0.000429],
[74.194, 74.222, 74.164, 74.199, 9e-05],
[74.19800000000001, 74.265, 74.181, 74.21300000000001, 7.099999999999999e-05],
[74.223, 74.244, 74.12, 74.17399999999999, 0.000124],
[74.181, 74.229, 74.132, 74.161, 8.7e-05],
[74.164, 74.337, 74.126, 74.324, 0.000299],
[74.303, 74.407, 74.30199999999999, 74.4, 0.000185],
[74.408, 74.44, 74.373, 74.40899999999999, 0.00016299999999999998],
[74.437, 74.438, 74.399, 74.418, 0.00020800000000000001],
[74.428, 74.464, 74.385, 74.385, 0.000231]
], columns=['open', 'high', 'low', 'close', 'volume'])
Run Code Online (Sandbox Code Playgroud)