Pre-trained models

Description

Context: Training the deep learning models typically requires labelled data points. At times, the training data can be huge and networks with large number of parameters may be required to achieve the desired performance.

Problem: The training process typically involves multiple rounds of training by trying out different permutations of hyperparameters to arrive at the best model. In the case of large networks with huge amount of training data, the process can take a very long time due to the number of computations involved. This can make the training process expensive in terms of energy consumed.

Solution: The energy requirements of training the network could be cut down using transfer learning if pre-trained models exist for the given task. Transfer learning is the approach where a machine learning model trained on one task is reused on a different task. Transfer learning can be used when the data available for training is limited or collecting them is too expensive. Sometimes, transfer learning may involve fine tuning the pre-trained model with a smaller dataset. Due to the absence of or reduction in the training involved, the corresponding computational energy spent on the process can be saved.

Example: Consider a scenario where the user is required to build a model that gives a vector representation of words from a natural language text. It would require training the neural network using the data from a large corpus of text over several iterations to arrive at an adequate vector representation. Instead, the user can use a pre-trained model like Sentence-BERT and save the energy required to train a model from scratch.

Pre-trained Networks