Retis

Retis

High Performance, Clockless AI Network on Chip

Chronos Retis^TM

Chronos Retis^TM is a scalable NoC targeting AI application, built upon reliable, clockless Chronos-LLP technology.

It eliminates the need of a large clock distribution network, enabling unlimited scalability, while minimizing power.

Unlike traditional Network on Chip where Latency and Throughput are correlated, this IP is able to decouple the two metrics, enabling unprecedented maximization of performance for the AI accelerators.

The utilization of a truly asynchronous arbiter enable packets to freely flow through the network without the need to wait for the clock, optimizing ques and buffering.

It supports both 2D mesh (5-port Routers) and 3D mesh (7-port Routers) as well as port stubbing for boundary conditions.

For a significant boost in overall system performance, Chronos Retis can connect directly to clockless IP, such as the Chronos Wormhole or Chronos TimeWarp.

Benefits

Better resource Utilization: There is no waiting for the clock, and synchronization only happens at the endpoints. (It can also operate without synchronization with clockless Computing Units)
Better power: No Clock distribution and automatic AVFS scaling. Each router can operate on a separate Voltage domain.
Improved latency performance: Computer Unit frequency does not dictate the number of pipeline stages. data flows at the maximum speed that the silicon can operate. Fewer collisions.
Future Proof: Not limited by OCV and MPW, allowing for unlimited scalability. Native Hybrid-Bonding support for 3D integration.

The side picture is showing the result of a system level simulation (using Gem5) utilizing realistic traffic MCLS Traffic Suite for the network, comparing a clocked NoC vs Chronos AI NoC both utilizing a four stage FIFO ate the input and two Virtual Channels.

Results show 25% better Throughput and 50% better Latency performance with Chronos AI NoC.

Example

Chronos AI NoC has been integrated in the BaseJump Manycore environment to verify functionality within a multicore RISCV AI accelerator (1.4GHz(@0.98V) 496-core RISCV accelerator with dual router credit based low latency network).

it has been able to run the software and pass all the tests with no need to modify anything beside the NoC.

With the updated NoC packets are able to flow freely among routers without having to wait for the clock improving overall performance.

Image from: BaseJump Manycore Accelerator Network (by Shaolin Xie and Michael Bedford Taylor)