Cerebras launches Condor Galaxy 1, the world’s most powerful AI supercomputer
A network of supercomputers with a capacity of 32 exaflops will be created.
“AI Supercomputing Creates a New Reality” is the latest achievement from Cerebras, the leader in AI computing in Silicon Valley.
Its CEO, Andrew Feldman, spoke about the launch of Condor Galaxy 1, an AI supercomputer that can perform 2 exaflops (2 billion billion operations) per second. This is just the beginning: over the next 12 weeks, the performance of the system should double, and by the beginning of 2024, two more systems twice as many will join it. Cerebras’ goal is to create a network of nine supercomputers with a combined performance of 36 exaflops.
Such supercomputers are needed to train large language models and other generative AIs that “devour” the world. Cerebras is not alone in this: other computer makers specializing in AI are also building huge systems based on their own processors or the latest GPU Nvidia-H100. However, Feldman is confident that the Condor Galaxy 1 is one of the largest AI supercomputers to date.
The Condor Galaxy-1 consists of 32 Cerebras CS-2 computers, which can be expanded up to 64. Each CS-2 has a Waferscale Engine-2, an AI chip made from a single silicon wafer containing 2.6 trillion. transistors and 850 thousand AI cores. The dimensions and specifications of this chip are so impressive that you can look at the following graph to see the difference.
A feature of Cerebras is the ability to easily scale their supercomputers for AI. For example, a neural network with 40 billion parameters can be trained in about the same time as a neural network with 1 billion parameters using 40 times more computing resources.
There is no need to write additional code for such scaling. This is a historically challenging task because large neural networks are not easy to break apart for effective learning. “We can scale linearly from 1 to 32 [CS-2] with one click,” he says.
The Condor Galaxy supercomputer series is owned by Abu Dhabi-based G42, which brings together nine AI enterprises, including G42 Cloud, one of the largest cloud providers in the Middle East. However, Cerebras will manage the supercomputers and lease spare resources that G42 does not use for its projects.
Demand for training large neural networks is growing rapidly, Feldman said, and the number of companies training AI models with 50 billion or more parameters has increased from 2 in 2021 to more than 100 this year.
Most of them use Nvidia GPU-based compute clusters, but some of them have developed their own AI chips, such as Google TPU and Amazon Trainium. There are also startups that create their own accelerators and computers for AI, such as Habana (now part of Intel), Graphcore and Samba Nova.
Meta, for example, built its AI Research SuperCluster with more than 6,000 Nvidia A100 GPUs. In phase two, it plans to add another 10,000 Nvidia-H100 processors. This will allow her to train AI models with trillions of parameters.