Towards a catalog of energy patterns for deep learning development
Go Back

Memory leaks

Take care of memory leaks and OOM errors before starting the training process

Description
memory leak image

Context: Memory management related exceptions constitute a major portion of fault triggers in deep learning. Memory leaks and OOM (out of memory) errors may during the training, loading of data, writing/saving of data and also during the inferencing

Problem: When a program terminates due to OOM errors during the process of training the network, in absence of a checkpoint, the knowledge gained during the training process is lost. Due to this, the computational energy spent on the training is wasted.

Solution: Factor in the memory availability constraints and possible OOM exceptions while designing the program to train the network and take appropriate steps to avoid them. This may reduce the chances of OOM errors and prevent the corresponding waste of energy.

Example: An example of this pattern can be seen in this stack overflow post .The user runs into out of memory error while training a Keras sequential model due to the inefficient management of the memory in a program that uses a for loop caus- ing a waste of energy spent on the training. This could be avoided by defining the network outside the for loop and reusing the same instance every iteration as suggested in the answer.

Related Stack Overflow Posts
Acknowledgements
Image Source: itnewstoday