Tachyum submits bid to build 20 exaflop supercomputers


Tachyum said on Tuesday it had submitted a bid to the Department of Energy to build a 20-exaflop supercomputer in 2025. The machine would be based on the company’s next-generation Prodigy processors featuring a proprietary microarchitecture that can be used for different types of workloads. .

The US DoE wants a 20-exaflop supercomputer with a power consumption of 20 MW to 60 MW to be delivered by 2025. The system is expected to be installed at Oak Ridge National Laboratory (ORNL) and will complement the Frontier system from the lab that went live earlier this year. Tachyum doesn’t disclose what hardware it has offered to the DoE, only saying it has its 128-core Prodigy processor today as well as a more capable Prodigy 2 processor in its roadmap, so it’s safe to say that by 2025 he will have the latter on hand and he might be able to address the system to come. Tachyum’s Prodigy is a universal homogeneous processor containing up to 128 proprietary 64-bit VLIW cores that feature two 1024-bit vector units per core and one 4096-bit matrix unit per core. Tachyum expected its flagship Prodigy T16128-AIX processor (opens in a new tab) to deliver up to 90 teraflops FP64 for HPC as well as up to 12 “petaflops AI” for AI inference and training (presumably when running INT8 or FP8 workloads). Prodigy consumes up to 950W and uses liquid cooling.

That was before Tachyum sued Cadence, its intellectual property provider, for the below-expectation performance of its Prodigy processor. We have no idea of ​​the current performance expectations for the chip. In theory, Tachyum could power an exascale system using more than 11,000 of its Prodigy processors, although the power consumption of such a machine would be gargantuan. Presumably, Prodigy 2 has a better chance of meeting the needs of a next-gen exascale system than the original Prodigy. There is currently one exaflops-class supercomputer in the United States, the Oak Ridge National Laboratory (ORNL) Frontier 1.1 exaflops system which is based on AMD‘s 64-core EPYC processors as well as Instinct MI250X compute GPUs. Two other exascale systems are being built in the US, the 2-exaflop Aurora machine powered by Intel’s 4th Gen Xeon Scalable processors and Xe-HPC (aka Ponte Vecchio) compute GPUs as well as the El Capitan “>2 exaflops” based on AMD’s Zen 4 architecture EPYC processors and Instinct MI300 GPUs. One of the interesting things about the DoE’s supercomputing plans is that it now wants to upgrade its high-performance computing capabilities every 12-24 months, not every 4-5 years. As a result, the DoE will be more eager to embrace exotic architectures like Tachyum’s Prodigy than it is today.

“We also want to explore the development of an approach that moves away from monolithic acquisitions towards a model that enables faster upgrade cycles of deployed systems, to enable faster innovation on hardware and software”, reads- on in a DoE document. “A possible strategy would include increased reuse of existing infrastructure so that upgrades are modular. One goal would be to reinvent systems architecture and an efficient acquisition process to continuously inject technological advancements into a facility (eg, every 12-24 months rather than every 4-5 years). Understanding the trade-offs of these approaches is one of the goals of this RFI, and we invite responses to include the perceived advantages and/or disadvantages of this modular upgrade approach. One of the advantages of Tachyum’s Prodigy over traditional CPUs and GPUs for AI and HPC workloads is that it’s suitable for both types of workloads, which is why Prodigy can be used for AI work when its HPC capabilities are not in use and vice versa. The DoE may or may not adopt Tachyum for one of its next supercomputers, but the company hopes to secure a suitable contract.

Source: This news was originally published by tomshardware


Comments are closed.