86-755-88844016 6*12 hours online call
AI high-performance computing - integrated storage and computing

Integrated storage and computing or in memory computing is the complete integration of storage and computing, directly utilizing memory for data processing or computation. Under the traditional von Neumann architecture, data storage and computation are separated. Due to the increasing performance gap between storage and computation, the speed at which the processor accesses stored data is much lower than the processor's computation speed. The energy consumption of data transportation between memory and main memory is also much higher than the energy consumed by the processor's computation.

From the architecture design of NPU and TPU, it can be seen that their essence is to solve the problem of efficient data access, with only slight differences in implementation methods. In summary, it mainly includes:

By increasing communication bandwidth, high-speed data transmission and reducing power consumption can be achieved.

Store data as close as possible to the computing unit to reduce data handling latency and power consumption, such as multi-level caching and on-chip storage.

Improve the efficiency of data reuse in both time and space, and reduce the number of communication between data and main memory.

The integration of storage and computing enables the storage unit to have computing power by integrating the computing and storage units on the same chip, which greatly reduces the latency and power consumption caused by data handling. It is particularly suitable for deep learning scenarios that require large-scale data handling and large-scale parallel computing.

In store calculation

In memory computing generally includes two implementation methods: digital computing and analog computing. Analog computing has high energy efficiency but low computational accuracy, while digital computing has high computational accuracy but high power consumption. Currently, the mainstream in memory computing mainly relies on analog computing.

Analog in memory computing is mainly based on physical laws to implement multiplication and addition operations on memory arrays. Taking the matrix multiplication MxN on a memristor as an example, before the operation, the matrix M is stored in the memristor in row and column positions. The input matrix M is represented by different voltage values. According to Ohm's law and Kirchhoff's law, the corresponding multiplication and addition current vectors can be obtained at the output end. Multiple memory arrays can be paralleled to complete multiple matrix multiplication calculations.

Digital in memory computing is the process of adding logical computing circuits such as AND or gates, multipliers, and adders to a storage array, enabling it to not only have storage functions but also computational capabilities. Due to the need to add logic circuits to each storage unit, it does not have an advantage in chip area, which limits its expansion of computing power. Therefore, the current implementation of digital in memory computing relies more on advanced technology. Due to factors such as technology and cost, its application range is greatly limited.

Brain like computing

The pulse neural network (SNN) based on the human brain's pulse simulation computing framework is expected to achieve artificial intelligence while reducing the energy consumption of computing platforms.

In the SNN model, the upstream neural pulse Vi is regulated by the synaptic weight Wi, and the synthesized current generated within a given time is equivalent to a dot product operation. From the legend, it can be seen that pulse computing simulates the neural computing process through the input and output of current. The entire system is event driven, and deep learning network computing is highly sparse. Therefore, through pulse communication and computing, large-scale parallel computing can be achieved at extremely low energy consumption.


Example diagram of pulse calculation

From the process of pulse computing, it can be seen that the hardware structure of pulse computing requires an integrated system design that tightly places neurons and synaptic arrays together. So currently, pulse neural network chips are mostly designed with an integrated architecture that simulates in memory computing.

Technical challenges

Although in memory computing has many advantages, it still faces many challenges in commercial applications. It still faces many problems and breakthroughs in device research and development, circuit design, chip architecture, generation manufacturing, EDA toolchain, and software algorithms, and the overall technical maturity is weak.

The integration of storage and computing needs to meet the design requirements of both storage and computing, such as the reliability of storage units, number of erasures, device consistency, and the response speed and power consumption of computing units. From the current semiconductor circuit design and manufacturing process, it is difficult to simultaneously consider the difficulties.

Due to limitations in process and chip area, the current commercial in memory computing chips have relatively low computational power and limited support for computing power and operators. Therefore, the neural network algorithms that can support them are also limited and have poor universality.

The current mainstream analog memory computing has poor computational accuracy, and inaccurate calculation results can lead to deviations between actual and ideal results. Although the calculation accuracy of digital memory is high, the computational cost is high.

The current implementation of logic unit circuits is mainly based on precise binary digit operations, while analog operations are relatively lacking in theory and circuit implementation, resulting in high difficulty in chip implementation of analog calculations.

The design of in memory computing chips differs significantly from conventional chips, and existing EDA tools cannot provide a standard unit library for chip designers to use. The lack of rapid development tools for large-scale memory arrays leads to low productization efficiency.

Solemnly declare that the article only represents the author's views and does not represent the views of our company. The copyright of this article belongs to the original author, and the reprint of the article is only for the purpose of disseminating more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you for your attention!

Hot news
FT2232 Development Board
A development board designed with FT2232 chip, which fully leads out the IO port, can be used to design an interface expansion board based on this.
AI High Performance Computing - Google TPU
Since Google launched the first generation self-developed artificial intelligence chip Tensor Processing Unit (TPU) in 2016, it has been upgraded to the fourth generation TPU v4 after several years of development (as of the end of 2022). The TPU architecture design also achieves efficient computation of network layers such as deep learning convolutional layer and fully connected layer by efficiently parallelizing a large number of multiplication and accumulation operations.
AI High Performance Computing - Cambrian NPU
The Cambrian period was one of the earliest AI chip companies in China to study. The design of their AI chip NPU (Neural Network Processing Unit) originated from a series of early AI chip architecture studies, mainly including DianNao, DaDianNao, PuDianNao, ShiDianNao, Cambricon-X, and other research achievements.
AI High Performance Computing - AI Chip Design
The simplest and most direct design approach for AI chips is to directly map neurons to hardware chips, as shown in the figure. The Full Hardware Implementation scheme maps each neuron to a logical computing unit and each synapse to a data storage unit. This architecture design can achieve a high-performance and low-power AI chip, such as an Intel ETANN chip. In the full hardware implementation scheme, the output data of the previous layer is multiplied by the weight, and the results of the multiplication are then added up, and then output to the next layer for calculation through an activation function. This architecture design tightly couples computing and storage, allowing the chip to avoid large-scale data access while performing high-speed computing, improving overall computing performance while also reducing power consumption.
AI High Performance Computing - AI Computing Features
AI computing characteristicsDesigning and deploying a specialized chip requires balancing various in...
AI High Performance Computing - AI Specific Chip
Currently, artificial intelligence (AI) computing mainly refers to neural network algorithms represented by deep learning. Traditional CPUs and GPUs can be used to perform AI algorithm operations, but they are not designed and optimized for deep learning characteristics, so they cannot fully adapt to AI algorithm characteristics in terms of speed and performance. Generally speaking, AI chips refer to ASICs (specialized chips) specially designed for AI algorithm characteristics.
Artificial intelligence chips are exploding! Three major mainstream AI chips, leading companies are fully sorted out
AI chips are the core component of AI server computing power, and with the rapid growth of AI computing power, there will be a greater demand for AI chips. According to the survey data, it is expected that the market size of China's artificial intelligence chips will reach 85 billion yuan in 2022, 103.9 billion yuan in 2023, and 178 billion yuan in 2025.
Isolation power supply uses a boost controller MAX668 to generate three isolation voltages for the slice power supply
This application note shows a power supply that generates three isolation voltages for a slice. The isolation power supply uses a boost controller MAX668 with a flyback transformer for -24V and -72V outputs, and uses an optocoupler to isolate the feedback signal. The linear regulator MAX8867 provides a 3.3V output.
Usage and precautions of voltage regulator diodes
Zener diodes can be used to stabilize power supplies and protect other electronic components from excessive voltage damage. It can also be used in various circuits such as oscillation circuits, modulation circuits, switching circuits, etc. to achieve different functions. When using a voltage regulator diode, it is necessary to pay attention to some usage methods and precautions.
Working Principle and Common Fault Handling of Thermal Resistors
A thermistor is an electronic component whose resistance varies with temperature, commonly used in fields such as temperature measurement, electronic temperature control, electronic temperature compensation, and automatic temperature control. The working principle of thermal resistors mainly involves the selection of materials, measurement of resistance values, and application of temperature sensitive characteristics. The following will provide a detailed introduction to the working principle and common fault handling of thermal resistors.
User Info:
Phone number
  • +86
  • +886
  • +852
Company Name
Product model
Comment message