具有多个变量的时间序列的递归神经网络-TensorFlow

Question

具有多个变量的时间序列的递归神经网络-TensorFlow

Lui*_*que 6 python neural-network reshape deep-learning

我正在使用以前的需求来预测未来的需求3 variables，但是每次运行代码时，我的Y axis显示错误

如果我Y axis单独使用一个变量，则没有错误。

例：

demandaY = bike_data[['cnt']]
n_steps = 20

for time_step in range(1, n_steps+1):
    demandaY['cnt'+str(time_step)] = demandaY[['cnt']].shift(-time_step).values

y = demandaY.iloc[:, 1:].values
y = np.reshape(y, (y.shape[0], n_steps, 1))

Run Code Online (Sandbox Code Playgroud)

数据集

脚本

features = ['cnt','temp','hum']
demanda = bike_data[features]
n_steps = 20

for var_col in features:
    for time_step in range(1, n_steps+1):
        demanda[var_col+str(time_step)] = demanda[[var_col]].shift(-time_step).values

demanda.dropna(inplace=True)
demanda.head()

n_var = len(features)
columns = list(filter(lambda col: not(col.endswith("%d" % n_steps)), demanda.columns))

X = demanda[columns].iloc[:, :(n_steps*n_var)].values
X = np.reshape(X, (X.shape[0], n_steps, n_var))

y = demanda.iloc[:, 0].values
y = np.reshape(y, (y.shape[0], n_steps, 1))

Run Code Online (Sandbox Code Playgroud)

输出值

ValueError: cannot reshape array of size 17379 into shape (17379,20,1)

Run Code Online (Sandbox Code Playgroud)

GitHub： 存储库

Answer 1

Sid*_*don 2

不清楚OP是否仍然想要答案，但我将发布我在评论中链接的答案并进行一些修改。

\n\n

时间序列数据集可以有不同的类型，让我们考虑一个具有X特征和Y标签的数据集。根据问题的不同，Y样本可能来自X，也可能是您想要预测的另一个目标变量。

\n\n

def create_dataset(X,Y, look_back=10, label_lag = -1, stride = 1):\n\n    dataX, dataY = [], []\n\n    for i in range(0,(len(X)-look_back + 1),stride):\n        a = X[i:(i+look_back)]\n        dataX.append(a)\n        b = Y[i + look_back + label_lag]\n        dataY.append(b)\n    return np.array(dataX), np.array(dataY)\n\nprint(features.values.shape,labels.shape)\n#(619,4), (619,1)\n\nx,y = create_dataset(X=features.values,Y=labels.values,look_back=10,stride=1)\n(x.shape,y.shape)\n#(610, 10, 4), (610, 1)\n

Run Code Online (Sandbox Code Playgroud)\n\n

其他参数的使用：

\n\n

label_lag：如果X样品准时t，Y样品就会准时t+label_lag。默认值会将和放在X同一Y索引处t。

\n\n

第一个样本的索引X和Y：

\n\n

if label_lag is -1:\nnp.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]\n#(10,10,10,10), (10)\n\nif label_lag is 0:\nnp.where(x[1,-1]==features.values)[0],np.where(y[1] == labels.values)[0]\n#(10,10,10,10), (11)\n

Run Code Online (Sandbox Code Playgroud)\n\n

look_back：这是当前时间步中数据集过去历史的样本数t。Look_back 为 10 意味着将有来自t-10 to t单个样本的样本。
stride：两个连续样本之间的指数差距。当时stride=2，如果X\xc2\xa0 的第一个样本有来自索引的行0 to 10，那么第二个样本将有来自索引的行2 to 12。

\n\n

Y另外，你还可以根据你当前的问题进行回顾，Y也可以是多维度的。在这种情况下，改变只是这样b=Y[i:(i+look_back+label_lag)]。

\n\n

TimeseriesGenerator可以通过from实现相同的功能keras。

\n\n

TimeseriesGenerator(features.values,labels.values,length=10,batch_size=64,stride=1)\n

Run Code Online (Sandbox Code Playgroud)\n\n

其中length\xc2\xa0 与相同look_back。features默认情况下， in和之间有一个间隙labels1，即 in 中的样本X来自索引t-10 to t，相应的样本 inY将位于索引t+1。如果您希望两者具有相同的索引shift在传递到生成器之前将标签加一即可。

\n

归档时间：	6 年，2 月前
查看次数：	160 次
最近记录：	6 年，2 月前