Tensorflow.js 中的内存泄漏:如何清理未使用的张量?

Tho*_*orf 11 javascript memory-leaks machine-learning node.js tensorflow.js

I'm writing a script, which sometimes leaks tensors. This can happen in multiple cases, for example when I'm training a neural network, but the training crashes. In this case, the training is interrupted and will not correctly dispose the tensors. This results in a memory leak, which I'm trying to clean up by disposing unused tensors.

Example

In the snippet below, I'm training two (very simple) models. The first run will work and will result in no leaked tensors (number of tensors before training = number of tensors after training). The second time, I'm using an invalid reshape layer to force a crash during training. Therefore, an error is thrown and the tensors from the dataset (I guess?) will not be correctly disposed. The code is an example to show how tensors might be leaked.

async function train(shouldCrash) {
  console.log(`Training, shouldCrash=${shouldCrash}`);
  const dataset = tf.data.zip({ // setup data
    xs: tf.data.array([[1],[1]]),
    ys: tf.data.array([1]),
  }).batch(1);

  const model = tf.sequential({ // setup model
    layers: [
      tf.layers.dense({units: 1, inputShape: [1]}),
      tf.layers.reshape({targetShape: [(shouldCrash ? 2 : 1)]}), // use invalid shape when crashing
    ],
  });
  model.compile({ optimizer: 'sgd', loss: 'meanSquaredError' });
  console.log('  Tensors before:', tf.memory().numTensors);
  try {
    const history = await model.fitDataset(dataset, { epochs: 1 });
  } catch (err) {
    console.log(`    Error: ${err.message}`);
  }
  console.log('  Tensors after:', tf.memory().numTensors);
}

(async () => {
  await train(false); // normal training
  await train(true); // training with error
})();
Run Code Online (Sandbox Code Playgroud)
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.1.2/dist/tf.min.js"></script>
Run Code Online (Sandbox Code Playgroud)

Question

tf.tidy, 在某些情况下可以帮助我处理未使用的张量,但它只能用于同步函数调用。因此,在调用await model.fitDataset(...).

有没有办法处理任何未使用的张量?或者,有没有办法处理页面上所有现有的张量(无需重新加载)?

小智 20

清除异步代码中任何未使用张量的方法是在 startScope() 和 endScope() 调用之间包装创建它们的代码。

tf.engine().startScope()
// do your thing
tf.engine().endScope()
Run Code Online (Sandbox Code Playgroud)