获取 ValueError :“只能使用 MultiIndex 进行元组索引”

Nim*_*lex 6 python deep-learning keras rnn

我正在尝试实现一个简单的 RNN 来预测整数序列中的下一个整数。所以,我有一个如下的数据集:

Id  Sequence
1   1,0,0,2,24,552,21280,103760,70299264,5792853248,587159944704
2   1,1,5,11,35,93,269,747,2115,5933,16717,47003,132291,372157,1047181,2946251,8289731,23323853,65624397,184640891,519507267,1461688413,4112616845,11571284395,32557042499,91602704493,257733967693
4   0,1,101,2,15,102,73,3,40,16,47,103,51,74,116,4,57,41,125,17,12,48,9,104,30,52,141,75,107,117,69,5,148,58,88,42,33,126,152,18,160,13,38,49,55,10,28,105,146,31,158
5   1,4,14,23,42,33,35,34,63,66,87,116,84,101,126,164,128,102,135,143,149,155,203,224,186,204,210,237,261,218,219,286,257,266,361,355,336,302,374,339,371,398,340,409,348,388,494,436,407,406
6   1,1,2,5,4,2,6,13,11,4,10,10,12,6,8,29,16,11,18,20,12,10,22,26,29,12,38,30,28,8,30,61,20,16,24,55,36,18,24,52,40,12,42,50,44,22,46,58,55,29,32,60,52,38,40,78,36,28,58,40,60,30,66,125,48,20,66,80,44,24
9   0,31,59,90,120,151,181,212,243,273,304,334,365,396,424,455,485,516,546,577,608,638,669,699,730,761,789,820,850,881,911,942,973,1003,1034,1064,1095,1126,1155,1186,1216,1247,1277,1308,1339,1369,1400,1430
10  1,1,2,5,13,36,111,347,1134,3832,13126,46281,165283,598401,2202404,8168642,30653724,116082962,442503542,1701654889,6580937039,25603715395,100223117080,394001755683,1556876401398,6178202068457,24608353860698,98421159688268,394901524823138,1589722790850089
12  0,0,0,0,112,40286,5485032,534844548,45066853496,3538771308282,267882021563464,19861835713621616,1453175611052688600,105278656040052332838,7564280930105061931496
Run Code Online (Sandbox Code Playgroud)

到目前为止我的代码是:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.preprocessing.sequence import pad_sequences

def stoarray(data = [], sep = ','):
    return data.map(lambda x: np.array(x.split(sep), dtype=float))

def create_dataset(dataset, window_size=1):
    dataX, dataY = [], []
    for i in range(len(dataset)-window_size-1):
        a = dataset[i:(i+window_size), 0]
        dataX.append(a)
        dataY.append(dataset[i + window_size, 0]) #gives the ValueError : Can only tuple index with multi index
    return np.array(dataX), np.array(dataY)

# fix random seed for reproducibility
np.random.seed(7)

# loading data
colna = ['id', 'seq']
train_data = pd.read_csv('G:/Python/integer_sequencing/testfile.csv', header=1)
train_data.columns = colna
dataset = train_data['seq']
#print(dataset)
window_size = 1
X_train, Y_train = create_dataset(dataset, window_size)

print(X_train.head(5))
print(Y_train.head(5))
Run Code Online (Sandbox Code Playgroud)

我试图用 X_train 作为输入分割每个序列,除了最后一项和 Y_train 被视为输出将只包含最后一位数字之外,它包含完整的集合。但是当我运行代码时,我得到了 ValueError : Can only tuple-index with a MultiIndex。任何人都可以解释它对我的代码意味着什么以及我必须做什么来解决它。

回溯调用: 在此处输入图片说明

PS - 我是堆栈溢出和深度学习的新手,所以如果您能提出建议并帮助我格式化我的问题,我将不胜感激。

fur*_*ras 7

你有问题i:(i+window_size), 0in dataset[i:(i+window_size), 0]

在您的代码中dataset意味着train_data['seq']这是单列 - 一维Series- 但您i:(i+window_size), 0在二维中使用like DataFrame

您只能使用单个整数 likedataset[0]或 slicedataset[i:(i+window_size)]