Hi,welcome
86-755-88844016 +852 2632 9637 6*12 hours online call
AI High Performance Computing - AI Specific Chip
2023-08-22

Currently, artificial intelligence (AI) computing mainly refers to neural network algorithms represented by deep learning. Traditional CPUs and GPUs can be used to perform AI algorithm operations, but they are not designed and optimized for deep learning characteristics, so they cannot fully adapt to AI algorithm characteristics in terms of speed and performance. Generally speaking, AI chips refer to ASICs (specialized chips) specially designed for AI algorithm characteristics.


The current deep learning algorithms have a wide range of applications in fields such as image recognition, speech recognition, and natural language processing. Common deep learning networks include CNN, RNN, and Transformer, which are essentially combinations of multiplication and addition of a large number of matrices or vectors. For example, the mainstream image object detection algorithm YOLO-V3 mainly consists of a large number of convolution, residual, fully connected and other types of calculations, and its essence is a large number of multiplication and addition operations. AI specialized chips, represented by operational neural network algorithms, require hardware with efficient linear algebraic computing capabilities, characterized by simple single tasks, large parallel computation, large data read and write operations, and low logic control requirements. So it has higher requirements for parallel computing, on-chip storage, high bandwidth, and low latency of chips.


Currently, GPU is one of the more mature chips used for deep learning training and inference. Companies such as Google, Microsoft, and Baidu are all using GPU for deep learning related model training and inference calculations. GPU provides the ability for efficient parallel computing with a large number of cores, which can support parallel computing of a large amount of data. NVIDIA has also developed a dedicated acceleration library cuDNN and inference tool TensorRT to accelerate the computational efficiency of deep learning on GPU. Although GPUs have a very wide range of applications in deep learning, their design was not specifically designed for deep learning, but for graphical computing. Therefore, they also have certain limitations in terms of performance and power consumption. Firstly, GPUs focus on low dimensional data structures, which are relatively inefficient in processing high-dimensional data for deep learning; Secondly, graphic computation requires high accuracy, while deep learning reasoning can effectively run with lower accuracy; Thirdly, GPU data is placed on external storage and shared storage is used for inter core communication, which can cause bottlenecks in bandwidth and latency. ASICs can be more targeted in hardware design and optimization, so in order to achieve better performance and power ratio, after the deep learning algorithm is stable, it is often necessary to use fully customized artificial intelligence chips to further optimize performance, power consumption, and area indicators.


Solemnly declare that the article only represents the author's views and does not represent the views of our company. The copyright of this article belongs to the original author, and the reprint of the article is only for the purpose of disseminating more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you for your attention!

Hot news
AUO
TFT-LCD modules, TFT-LCD panels, energy storage/management systems, touch solutions, etc.
The working principle and classification of electromagnetic voltage transformers
Electromagnetic voltage transformers are commonly used in power systems to measure voltage on high-voltage transmission lines. They can also be used to monitor the voltage waveform and amplitude in the power system, in order to timely detect faults and problems in the power system. In this article, we will provide a detailed introduction to the working principle and classification of electromagnetic voltage transformers.
Differences between thermal relays and thermal overload relays
Thermal relays and thermal overload relays are common electrical protection devices, but their working principles and protection objects are different. In this article, we will provide a detailed introduction to the differences between thermal relays and thermal overload relays.
Types and Packaging of Tantalum Capacitors
Tantalum capacitors are electronic components that use tantalum metal as the electrode material. They are usually divided into two types: polarized and unpolarized, and come in various packaging forms. In this article, we will discuss in detail the types and packaging of tantalum capacitors.
The difference between thermal relays and fuses
Thermal relays and fuses are common electrical components that play a protective role in circuits. Although they can both interrupt the circuit, there are some differences between them. In this article, we will provide a detailed introduction to the differences between thermal relays and fuses.
FT2232 Development Board
A development board designed with FT2232 chip, which fully leads out the IO port, can be used to design an interface expansion board based on this.
AI high-performance computing - integrated storage and computing
Integrated storage and computing or in memory computing is the complete integration of storage and computing, directly utilizing memory for data processing or computation. Under the traditional von Neumann architecture, data storage and computation are separated. Due to the increasing performance gap between storage and computation, the speed at which the processor accesses stored data is much lower than the processor's computation speed. The energy consumption of data transportation between memory and main memory is also much higher than the energy consumed by the processor's computation.
AI High Performance Computing - Google TPU
Since Google launched the first generation self-developed artificial intelligence chip Tensor Processing Unit (TPU) in 2016, it has been upgraded to the fourth generation TPU v4 after several years of development (as of the end of 2022). The TPU architecture design also achieves efficient computation of network layers such as deep learning convolutional layer and fully connected layer by efficiently parallelizing a large number of multiplication and accumulation operations.
AI High Performance Computing - Cambrian NPU
The Cambrian period was one of the earliest AI chip companies in China to study. The design of their AI chip NPU (Neural Network Processing Unit) originated from a series of early AI chip architecture studies, mainly including DianNao, DaDianNao, PuDianNao, ShiDianNao, Cambricon-X, and other research achievements.
AI High Performance Computing - AI Chip Design
The simplest and most direct design approach for AI chips is to directly map neurons to hardware chips, as shown in the figure. The Full Hardware Implementation scheme maps each neuron to a logical computing unit and each synapse to a data storage unit. This architecture design can achieve a high-performance and low-power AI chip, such as an Intel ETANN chip. In the full hardware implementation scheme, the output data of the previous layer is multiplied by the weight, and the results of the multiplication are then added up, and then output to the next layer for calculation through an activation function. This architecture design tightly couples computing and storage, allowing the chip to avoid large-scale data access while performing high-speed computing, improving overall computing performance while also reducing power consumption.
User Info:
Phone number
+86
  • +86
  • +886
  • +852
Company Name
Email
Product model
Quantity
Comment message