A key problem for embedded neural networks is reduction of size and power consumption.
The hardware on which the neural net runs on can be a dedicated chip, an FPGA, a GPU or a CPU. Each of these consumes about 10x the power of the previous choice. But in terms of upfront cost, the dedicated chip costs the highest, the CPU the lowest. An NVidia whitepaper compares GPU with CPU on speed and power consumption. (It discusses key neural networks like AlexNet. The AlexNet was a breakthrough in 2012 showing a neural network to be superior to other image recognition approaches by a wide margin).
Reducing the size of the neural network also reduces its power consumption. For NN size reduction, pruning of the weak connections in the net was proposed in “Learning both Weights and Connections for Efficient Neural Networks” by Song Han and team at NVidia and Stanford. This achieved a roughly 10x reduction in network size without loss of accuracy. Further work in “Deep Compression” achieved a 35x reduction.
Today I attended a talk on SqueezeNet by Forrest Iandola. His team at Berkeley modified (squeezed) the original architecture, then applied the Deep Compression technique above to achieve a 461x size reduction over the original, to 0.5Mb. This makes it feasible for mobile applications. This paper also references the V.Badrinarayan’s work on SegNet – a different NN architecture, discussed in a talk earlier this year.
The Nervana acquisition by Intel earlier this year was for a low power GPU like SOC chip with very high memory bandwidth.