MIT Researchers Develop System to Optimize Machine Learning Algorithms
Boosting Efficiency in AI Models
The neural network artificial intelligence models used in applications like medical image processing and speech recognition perform operations on hugely complex data structures that require an enormous amount of computation to process. This is one reason deep-learning models consume so much energy.
To improve the efficiency of AI models, MIT researchers created an automated system that enables developers of deep learning algorithms to simultaneously take advantage of two types of data redundancy. This reduces the amount of computation, bandwidth, and memory storage needed for machine learning operations.
Optimizing Algorithms
Existing techniques for optimizing algorithms can be cumbersome and typically only allow developers to capitalize on either sparsity or symmetry — two different types of redundancy that exist in deep learning data structures.
By enabling a developer to build an algorithm from scratch that takes advantage of both redundancies at once, the MIT researchers’ approach boosted the speed of computations by nearly 30 times in some experiments.
User-Friendly Programming Language
The system utilizes a user-friendly programming language, making it possible to optimize machine-learning algorithms for a wide range of applications. The system could also help scientists who are not experts in deep learning but want to improve the efficiency of AI algorithms they use to process data. In addition, the system could have applications in scientific computing.
Cutting Out Computation
In machine learning, data are often represented and manipulated as multidimensional arrays known as tensors. A tensor is like a matrix, which is a rectangular array of values arranged on two axes, rows and columns. But unlike a two-dimensional matrix, a tensor can have many dimensions, or axes, making tensors more difficult to manipulate.
Deep-learning models perform operations on tensors using repeated matrix multiplication and addition — this process is how neural networks learn complex patterns in data. The sheer volume of calculations that must be performed on these multidimensional data structures requires an enormous amount of computation and energy.
Sparsity and Symmetry
But because of the way data in tensors are arranged, engineers can often boost the speed of a neural network by cutting out redundant computations. For instance, if a tensor represents user review data from an e-commerce site, since not every user reviewed every product, most values in that tensor are likely zero. This type of data redundancy is called sparsity. A model can save time and computation by only storing and operating on non-zero values.
In addition, sometimes a tensor is symmetric, which means the top half and bottom half of the data structure are equal. In this case, the model only needs to operate on one half, reducing the amount of computation. This type of data redundancy is called symmetry.
SySTeC: The Compiler
To simplify the process, the researchers built a new compiler, called SySTeC, which is a computer program that translates complex code into a simpler language that can be processed by a machine. Their compiler can optimize computations by automatically taking advantage of both sparsity and symmetry in tensors.
How SySTeC Works
The researchers began the process of building SySTeC by identifying three key optimizations they can perform using symmetry. First, if the algorithm’s output tensor is symmetric, then it only needs to compute one half of it. Second, if the input tensor is symmetric, then the algorithm only needs to read one half of it. Finally, if intermediate results of tensor operations are symmetric, the algorithm can skip redundant computations.
To use SySTeC, a developer inputs their program and the system automatically optimizes their code for all three types of symmetry. Then the second phase of SySTeC performs additional transformations to only store non-zero data values, optimizing the program for sparsity.
Conclusion
In conclusion, the MIT researchers’ system, SySTeC, has the potential to significantly reduce the energy consumption and computational resources required for deep learning models. By enabling developers to simultaneously take advantage of sparsity and symmetry in tensors, SySTeC can speed up computations by nearly 30 times in some experiments. This technology has the potential to transform the way we approach machine learning and its applications.
FAQs
* Q: What is the main goal of SySTeC?
A: The main goal of SySTeC is to optimize machine-learning algorithms for deep learning models by automatically taking advantage of sparsity and symmetry in tensors.
Q: How does SySTeC work?
A: SySTeC is a compiler that translates complex code into a simpler language that can be processed by a machine. It identifies three key optimizations using symmetry and performs additional transformations to optimize the program for sparsity.
Q: What are the potential applications of SySTeC?
A: SySTeC has the potential to be used in a wide range of applications, including medical image processing, speech recognition, and scientific computing. It could also be used to optimize code for more complicated programs.