Lin*_*ing 3 python machine-learning h5py keras tensorflow
我的模型使用预处理的数据来预测客户是私人客户还是非私人客户。预处理步骤使用诸如 feature_column.bucketized_column(\xe2\x80\xa6)、feature_column.embedding_column(\xe2\x80\xa6) 等步骤。\n训练后,我尝试保存模型,但是我收到以下错误:
\n\n\n文件“h5py_objects.pyx”,第 54 行,在 h5py._objects.with_phil.wrapper 中
\n
\n文件“h5py_objects.pyx”,第 55 行,在 h5py._objects.with_phil.wrapper 中 \n
文件“h5py\\h5o.pyx”,第 202 行,在 h5py.h5o.link
\nOSError: 无法创建链接(名称已存在)
我已尝试以下方法来解决我的问题:
\n一切都没有成功!
\n这是模型的相关代码:
\n(feature_columns, train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.zip, args.batchSize)\n\nfeature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)\n\nmodel = tf.keras.models.Sequential([\n feature_layer,\n tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)\n ])\n\nmodel.compile(optimizer=\'sgd\',\n loss=\'binary_crossentropy\',\n metrics=[\'accuracy\'])\n\nparamString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)\n\n...\n\nmodel.fit(train_ds,\n validation_data=val_ds,\n epochs=args.epoch,\n callbacks=[tensorboard_callback])\n\n\nmodel.summary()\n\nloss, accuracy = model.evaluate(test_ds)\nprint("Accuracy", accuracy)\n\nparamString = paramString + "-a{:.4f}".format(accuracy)\n\noutputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramStrin\n\nif args.saveModel:\n filepath = "./saved_models/" + outputName + ".h5"\n model.save(filepath, save_format=\'h5\')\nRun Code Online (Sandbox Code Playgroud)\n预处理Modul中调用的函数:
\ndef getPreProcessedDatasets(filepath, zippath, batch_size, bucketSizeGEO):\n print("start preprocessing...")\n\n path = filepath\n data = pd.read_csv(path, dtype={\n "NAME1": np.str_, \n "NAME2": np.str_, \n "EMAIL1": np.str_, \n "ZIP": np.str_, \n "STREET": np.str_, \n "LONGITUDE":np.floating, \n "LATITUDE": np.floating, \n "RECEIVERTYPE": np.int64}) \n\n feature_columns = []\n\n data = data.fillna("NaN")\n\n data = __preProcessName(data)\n data = __preProcessStreet(data)\n \n train, test = train_test_split(data, test_size=0.2, random_state=0)\n train, val = train_test_split(train, test_size=0.2, random_state=0)\n\n train_ds = __df_to_dataset(train, batch_size=batch_size)\n val_ds = __df_to_dataset(val, shuffle=False, batch_size=batch_size)\n test_ds = __df_to_dataset(test, shuffle=False, batch_size=batch_size)\n\n\n __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, True)\n\n print("preprocessing completed")\n\n return (feature_columns, train_ds, val_ds, test_ds)\nRun Code Online (Sandbox Code Playgroud)\n调用特征的不同预处理函数:
\ndef __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, addCrossedFeatures):\n \n feature_columns.append(__getFutureColumnLon(bucketSizeGEO))\n feature_columns.append(__getFutureColumnLat(bucketSizeGEO))\n \n (namew1_one_hot, namew2_one_hot) = __getFutureColumnsName(__getNumberOfWords(data, \'NAME1PRO\'))\n feature_columns.append(namew1_one_hot)\n feature_columns.append(namew2_one_hot)\n \n feature_columns.append(__getFutureColumnStreet(__getNumberOfWords(data, \'STREETPRO\')))\n \n feature_columns.append(__getFutureColumnZIP(2223, zippath))\n \n if addCrossedFeatures:\n feature_columns.append(__getFutureColumnCrossedNames(100))\n feature_columns.append(__getFutureColumnCrossedZIPStreet(100, 2223, zippath))\nRun Code Online (Sandbox Code Playgroud)\n与嵌入相关的函数:
\ndef __getFutureColumnsName(name_num_words):\n vocabulary_list = np.arange(0, name_num_words + 1, 1).tolist()\n\n namew1_voc = tf.feature_column.categorical_column_with_vocabulary_list(\n key=\'NAME1W1\', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)\n namew2_voc = tf.feature_column.categorical_column_with_vocabulary_list(\n key=\'NAME1W2\', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)\n\n dim = __getNumberOfDimensions(name_num_words)\n\n namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)\n namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)\n\n return (namew1_embedding, namew2_embedding)\nRun Code Online (Sandbox Code Playgroud)\ndef __getFutureColumnStreet(street_num_words):\n vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()\n\n street_voc = tf.feature_column.categorical_column_with_vocabulary_list(\n key=\'STREETW\', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)\n\n dim = __getNumberOfDimensions(street_num_words)\n\n street_embedding = feature_column.embedding_column(street_voc, dimension=dim)\n\n return street_embedding\nRun Code Online (Sandbox Code Playgroud)\ndef __getFutureColumnZIP(zip_num_words, zippath):\n zip_voc = feature_column.categorical_column_with_vocabulary_file(\n key=\'ZIP\', vocabulary_file=zippath, vocabulary_size=zip_num_words,\n default_value=0)\n\n dim = __getNumberOfDimensions(zip_num_words)\n\n zip_embedding = feature_column.embedding_column(zip_voc, dimension=dim)\n\n return zip_embedding\nRun Code Online (Sandbox Code Playgroud)\n
OSError: Unable to create link (name already exists)以h5格式保存模型时出现错误是由于某些重复的变量名称引起的。检查for i, w in enumerate(model.weights): print(i, w.name)显示它们是 embedding_weights 名称。
通常,在构建时feature_column,传入每个特征列的distinctkey将用于构建distinct变量name。这在 TF 2.1 中工作正常,但在 TF 2.2 和 2.3 中出现问题,据说在 TF 2.4 中已修复。
| 归档时间: |
|
| 查看次数: |
5313 次 |
| 最近记录: |