AMD, Xilinx and GPU AI strategy to software plans • The Register


Analysis After reestablishing itself in the data center for the past few years, AMD now hopes to become a major player in the AI ​​computing space with an expanded portfolio of chips that span everything from edge to cloud.

That’s quite an ambitious goal, given Nvidia’s dominance in the space with its GPUs and the CUDA programming model, as well as growing competition from Intel and several other companies.

But as executives explained at AMD’s Financial Analyst Day 2022 event last week, the resurgent chip designer believes it has the right silicon and software in place to pursue the broader space. of AI.

“Our vision here is to deliver a broad technology roadmap across training and inference that touches cloud, edge, and endpoint, and we can do that because we’re exposed to all of these markets and all of these products,” said Lisa Su, CEO of AMD. his opening remarks at the end.

Su admitted that it will take “a lot of work” for AMD to catch up in the AI ​​space, but she said the market represented the company’s “highest growth opportunity”.

Developing Early Traction with CPUs in AI Inference

At last week’s event, AMD executives said they had started to see early traction in the AI ​​computing market with the company’s Epyc server chips used for AI applications. inference and its Instinct data center GPUs deployed for training AI models.

For example, several cloud service providers are already using AMD’s software optimizations through its ZenDNN library to deliver a “really nice performance boost” on recommendation engines using the company’s Epyc processors, according to Dan McNamara, head of AMD’s Epyc business.

Short for Zen Deep Neural Network, ZenDNN is integrated with the popular TensorFlow and PyTorch frameworks as well as ONNXRT, and it is supported by the second and third generation Epyc chips.

“I think it’s really important to say that a large percentage of inference happens in processors, and we expect that to continue,” he said.

In the near future, AMD is looking to introduce more AI capabilities into processors at the hardware level.

This includes the AVX-512 VNNI instruction, which will be introduced to speed up neural network processing in the next-generation Epyc chips, named Genoa, which will be released later this year.

Since this capability is implemented in Genoa’s Zen 4 architecture, VNNI will also be present in the company’s Ryzen 7000 desktop chips, which are also expected by the end of the year.

AMD plans to expand the AI ​​capabilities of its future processors using AI engine technology from its $49 billion acquisition of FPGA designer Xilinx, which closed earlier this year.

An image showing AMD's XDNA adaptive IP architecture: the AI ​​engine and FPGA fabric.

These building blocks will appear in future chips in AMD’s portfolio. Click to enlarge.

The AI ​​engine, which falls under AMD’s new XDNA banner of “adaptive architecture” building blocks, will be integrated into several new products in the company’s portfolio going forward.

After debuting in Xilinx’s Versal adaptive chip in 2018, the AI ​​engine will be integrated into two future generations of Ryzen laptop chips. The first is codenamed Phoenix Point and will arrive in 2023 while the second is codenamed Strix Point and will arrive in 2024. The AI ​​engine will also be used in a future generation of Epyc server chips, although AMD won’t didn’t say when it would happen.

In 2024, AMD plans to launch the first chips using its next-generation Zen 5 architecture, which will include new optimizations for AI and machine learning workloads.

Big AI training ambitions with the “first data center APU”

When it comes to GPUs, AMD has made progress in AI training with its latest generation of Instinct GPUs, the MI200 series, and hopes to make even more progress in the near future with new software enhancements. and silicon.

For example, in the latest version of its ROCm GPU computing software, AMD added optimizations for training and inference workloads running on frameworks such as PyTorch and TensorFlow.

The company has also extended ROCm support to its consumer-focused Radeon GPUs that use the RDNA architecture, according to David Wang, head of AMD’s GPU business.

“And finally, we develop SDKs with pre-optimized models to facilitate the development and deployment of AI applications,” he said.

To drive the adoption of its GPUs for AI purposes, Wang said AMD has developed “deep partnerships with some of the key industry leaders,” including Microsoft and Facebook parent company Meta.

“We optimized the ROCm for PyTorch to deliver incredible, very, very competitive performance for their in-house AI workloads as well as the jointly developed open source benchmarks,” he said.

In the future, AMD hopes to become even more competitive in AI training with the Instinct MI300, which it calls the “world’s first data center APU” because the chip combines an Epyc processor based on Zen 4 with a GPU that uses the company’s new processor. CDNA3 architecture.

AMD claims that the Instinct MI300 should deliver an over 8x increase in AI training performance over its Instinct MI250X chip currently in the market.

“The MI300 is a truly amazing piece, and we believe it points the direction of the future of acceleration,” said Forrest Norrod, Head of AMD’s Datacenter Solutions Business Group.

Using Xilinx to Extend to the Edge and Improve Software

While AMD plans to use Xilinx’s technology in future processors, the chip designer has made it clear that the acquisition will also help the company cover a wider range of AI opportunities and to strengthen its software offerings. The latter is essential if AMD wants to better compete with Nvidia and others.

This was presented by Victor Peng, the former CEO of Xilinx who now heads AMD’s Adaptive and Embedded group, which leads the development of all FPGA-based products in Xilinx’s portfolio.

Prior to the acquisition of Xilinx made earlier this year, AMD’s coverage in the AI ​​compute space was primarily in cloud data centers with its Epyc and Instinct chips, in enterprises with its Epyc and Ryzen pro, and in homes with its Ryzen and Radeon chips.

But with Xilinx’s portfolio now under the AMD banner, the chip designer has much broader coverage in the AI ​​market. Indeed, Xilinx’s Zynq adaptive chips are used in a variety of industries, including healthcare and life sciences, transportation, smart retail, smart cities, and smart factories. Versal adaptive chips from Xilinx, on the other hand, are used by telecommunications providers. Xilinx also has Alveo accelerators and Kintex FPGAs which are also used in cloud data centers.

An image showing AMD's industrial coverage with its CPUs, GPUs and adaptive chips.

With the acquisition of Xilinx, AMD’s products span multiple sectors of the AI ​​computing space. Click to enlarge.

“We’re actually in quite a lot of areas that do AI, mostly inference, but, again, the intensive training is happening in the cloud,” Peng said.

AMD sees Xilinx products as “highly complementary” with its CPU and GPU portfolio. As such, the company targets its combo offerings for a wide range of AI application needs:

  • Ryzen and Epyc processors, including future Ryzen processors with the AI ​​engine, will cover small to medium models for training and inference
  • Epyc processors with the AI ​​engine, Radeon GPUs and Versal chips will cover medium to large models for training and inference
  • Instinct GPUs and adaptive chips from Xilinx will cover very large models for training and inference

“Once we start integrating AI into more of our products and moving to the next generation, we cover a lot more model space,” Peng said.

An image showing AMD's AI application coverage with its processors, GPUs and adaptive chips.

How AMD sees processors, GPUs and adaptive chips spanning different parts of the AI ​​spectrum. Click to enlarge.

But if AMD wants wider industry adoption of its chips for AI purposes, the company will need to ensure that developers can easily program their applications in this menagerie of silicon.

That’s why the chip designer plans to consolidate previously disparate software stacks for CPUs, GPUs, and adaptive chips into a single interface, which it calls AMD Unified AI Stack. The first release will bring together AMD’s ROCm software for GPU programming, its CPU software, and Xilinx’s Vitis AI software to provide unified development and deployment tools for inference workloads.

Peng said the unified AI stack will be an ongoing development effort for AMD, which means the company plans to consolidate even more software components in the future, so that, for example, developers won’t have to use only one machine learning graph compiler for any type of chip. .

“Now people can, in the same development environment, achieve any of these target architectures. And in the next generation, we’re going to unify the middleware even more,” he said.

Although AMD has laid out a very ambitious strategy for AI computing, it will undoubtedly take a lot of heavy work and good deeds from developers to make such a strategy work. ®


Comments are closed.