Currently, artificial intelligence (AI) computing mainly refers to neural network algorithms represented by deep learning. Traditional CPUs and GPUs can be used to perform AI algorithm operations, but they are not designed and optimized for deep learning characteristics, so they cannot fully adapt to AI algorithm characteristics in terms of speed and performance. Generally speaking, AI chips refer to ASICs (specialized chips) specially designed for AI algorithm characteristics.
The current deep learning algorithms have a wide range of applications in fields such as image recognition, speech recognition, and natural language processing. Common deep learning networks include CNN, RNN, and Transformer, which are essentially combinations of multiplication and addition of a large number of matrices or vectors. For example, the mainstream image object detection algorithm YOLO-V3 mainly consists of a large number of convolution, residual, fully connected and other types of calculations, and its essence is a large number of multiplication and addition operations. AI specialized chips, represented by operational neural network algorithms, require hardware with efficient linear algebraic computing capabilities, characterized by simple single tasks, large parallel computation, large data read and write operations, and low logic control requirements. So it has higher requirements for parallel computing, on-chip storage, high bandwidth, and low latency of chips.
Currently, GPU is one of the more mature chips used for deep learning training and inference. Companies such as Google, Microsoft, and Baidu are all using GPU for deep learning related model training and inference calculations. GPU provides the ability for efficient parallel computing with a large number of cores, which can support parallel computing of a large amount of data. NVIDIA has also developed a dedicated acceleration library cuDNN and inference tool TensorRT to accelerate the computational efficiency of deep learning on GPU. Although GPUs have a very wide range of applications in deep learning, their design was not specifically designed for deep learning, but for graphical computing. Therefore, they also have certain limitations in terms of performance and power consumption. Firstly, GPUs focus on low dimensional data structures, which are relatively inefficient in processing high-dimensional data for deep learning; Secondly, graphic computation requires high accuracy, while deep learning reasoning can effectively run with lower accuracy; Thirdly, GPU data is placed on external storage and shared storage is used for inter core communication, which can cause bottlenecks in bandwidth and latency. ASICs can be more targeted in hardware design and optimization, so in order to achieve better performance and power ratio, after the deep learning algorithm is stable, it is often necessary to use fully customized artificial intelligence chips to further optimize performance, power consumption, and area indicators.
Solemnly declare that the article only represents the author's views and does not represent the views of our company. The copyright of this article belongs to the original author, and the reprint of the article is only for the purpose of disseminating more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you for your attention!