Look for built in library functions for tensor operations before writing custom implementations
Context: A tensor is a generalization of vectors and matrices in higher dimensions. Tensors are a fundamental data structure used in deep learning and are used to represent the input, output and transformations. Thus tensor operations are performed very frequently in deep learning programs.
Problem: Given the large number of operations that can be performed on tensors, the users may lack awareness about appropriate library functions that can help them implement the operation and go for custom implementations. These custom implementations may not be the most efficient or optimized ones
Solution: Look for existing library functions or ways to combine multiple library functions that can perform the given tensor operation. Library functions are often designed to perform the operations more efficiently with optimized use of resources. Due to this, they can be more energy efficient.
Example: Consider a case based on this post where the user wants to multiply every element in a batch of tensors with every other element, except for itself. The user could write a custom implementation as shown in the question of the post. But this method may not be optimized in terms of usage of memory and the number of computations. A more energy efficient way instead would be use combination of built-in Pytorch functions as shown in the answer.