2025-04-01
The sudden popularity of generative artificial intelligence has not only put forward higher requirements for computing power chips, but also made it increasingly obvious that traditional system design cannot meet computing needs.
In the past development, the expansion capacity of memory and I/O has lagged far behind the growth of computing density, and the average memory and I/O bandwidth per core has continued to decline. With the advent of the AIGC era, memory requirements have increased significantly, and at the same time, there are also massive I/O communication needs such as gradient data aggregation and distribution. At this time, a new architecture is needed to alleviate the system memory and I/O bottlenecks, achieve data processing scale, parallel processing capabilities and improvement in system computing power.
CXL refers to Compute Express Link. As an open standard high-speed interconnect protocol, CXL was launched mainly to solve the interconnection problem between computing devices and memory, aiming to improve communication between processors and accelerators, memory expansion devices, etc.
Technically, CXL transmits signals over the existing PCIe (Peripheral Component Interconnect Express) physical layer, but also introduces new features and improvements at the protocol level to dramatically increase the efficiency and consistency of data exchange between processors, accelerators and memory devices in a system. This enables resource sharing with lower latency, reduced software stack complexity, and lower overall system cost, providing more robust support for high-performance computing and large-scale data processing.
Since its initial release in 2019, CXL has evolved to the CXL 3.1 standard over the past few years. In terms of applicability, it has also increased from supporting only limited functionality at the outset to include additional structural improvements, new Trusted Execution Environment enhancements, and Memory Expander improvements for the scale-out CXL.
Specifically, CXL has the following three key features
■ Unified memory model: CXL supports three protocols, namely CXL.io, CXL.cache and CXL.memory. Among them, CXL.io is mainly used for traditional I/O operations, similar to PCIe; CXL.cache and CXL.memory provide cache consistency and memory access capabilities, allowing the CPU and accelerators or memory expansion devices to share and consistently access memory. This is especially important for accelerators (such as GPUs, FPGAs) because they can access system memory more efficiently without having to go through slow I/O channels.
■ Cache consistency: CXL allows processors and external devices (such as accelerators) to share the same memory space and maintain cache consistency. This means that data does not need to be frequently copied or synchronized when transferred between different devices, which improves performance.
■ High bandwidth and low latency: Through the optimized protocol stack, CXL can provide low-latency communication while maintaining high bandwidth.
This makes it very suitable for applications that require fast data exchange, such as AI acceleration, data analysis, etc. Compared with the traditional RDMA-based disaggregated memory architecture, CXL can achieve low latency in nanoseconds, and its latency is also several orders of magnitude lower than that of NVDIMM's non-volatile memory.
Unlike other interconnect protocols, the key difference of CXL is that its hardware supports cache coherency. It is precisely because of this feature that CXL can achieve shared and consistent access to memory between the CPU and accelerators or memory expansion devices, truly realizing a rack-level disaggregated memory decoupling architecture. In addition to the CXL protocol's original conception of CPU-GPU accessing each other's memory at cacheline granularity and being able to cache, CXL can also well solve the challenges of memory capacity, cost, and utilization in the LLM era.
As shown in the figure above, there are three typical use cases for CXL. Yole, a well-known analysis agency, is even more optimistic that the total revenue of the CXL market will grow to more than $15 billion by 2028. DRAM will make up the majority of CXL's market revenue, which will exceed $12 billion by 2028. In addition, CXL controllers and CXL switches will also develop rapidly in the market.