The NVidia Volta-100 GPU released in Dec 2017 was the first microprocessor with dedicated cores purely for matrix computations, called Tensor Cores. The Ampere-100 GPU released May’20 is its successor. A comparison is provided here. Tensor Cores reduce the cycle time for matrix multiplications, operating on 4×4 matrices of 16bit floating point numbers. These GPU are aimed at Deep Learning use cases which consist of a pipeline of matrix operations.
In math, we have Scalars and Vectors. Scalar are used for magnitude and Vector encode magnitude and direction. To transform Vectors, one applies Linear Transformations in the form of Matrices. Matrices for Linear Transformations have EigenVectors and EigenValues which describe the invariants of the transformation. A Tensor in math and physics is a concept that exhibits certain types invariance during transformations. In 3 dimensions, a Stress Tensor has 9 components, which can be representated as a 3×3 matrix.
In Deep Learning applications a Tensor is basically a Matrix. The Generalized Matrix Multiplication (GEMM) operation, D=AxB+C, is at the heart of Deep Learning, and the Tensor Cores are designed to speed these up.
Tesla Dojo is an advertised attempt to build a processor/computer dedicated for Deep Learning to process vast amounts of video data.
AWS Inferentia is a chip for deep learning with four Neuron Cores .
Generally speaking the desire in deep learning community is to have simpler processing units in larger numbers.