我正在尝试实现一个自定义数据生成器,该生成器使用 .csv 文件从块中读取数据pandas.read_csv。我对其进行了测试,model.predict_generator但返回的预测数量少于预期(在我的情况下,253457 个中的 248192 个)。
自定义生成器
class TestDataGenerator:
def __init__(self, directory, batch_size=1024):
self.directory = directory
self.batch_size = batch_size
self.chunk_size=10000
self.samples = 0
def _to_movie_id(self, ids):
ids = ast.literal_eval(ids)
if ids == []:
return [EMB_MATRIX_SIZE-1]
else:
return [movie2idx[str(movie_id)] for movie_id in ids]
def generate(self):
csv_files = glob.glob(self.directory + '/*.csv')
while True:
for file in csv_files:
df = pd.read_csv(file, chunksize=self.chunk_size)
for df_chunk in df:
chunk_steps = math.ceil(len(df_chunk) / self.batch_size)
for i in range(chunk_steps):
batch = df_chunk[i * …Run Code Online (Sandbox Code Playgroud)