The role of software in resurrected system design at Hot Chips


The software market will be almost three times larger than the hardware market in 2026, and that fact was not lost on Intel CEO Pat Gelsinger at the Hot Chips conference this month.

Software will drive the development of hardware, especially chips, as complex systems drive the insatiable demand for computing power, Gelsinger said during his speech at the conference.

“Software has become much more important. We have to treat it like the sacred interfaces that we have and then integrate hardware and that’s where silicon has to integrate,” Gelsinger said.

The importance of software in silicon development saw a revival at the Hot Chips conference. According to an article published by IEEE in 1997.

Software is driving the industry forward with new styles of computing such as AI, and chipmakers are now taking a software-centric approach to hardware development to support new applications.

The idea of ​​software driving hardware development isn’t new, but it was resurrected in the age of workload accelerators, said Kevin Krewell, analyst at Tirias Research.

“We’ve had FPGAs since the 1980s and they’re software-defined hardware. The most modern interpretation is that hardware is an amorphous collection of hardware blocks that are orchestrated by a compiler to perform certain workloads efficiently, without a lot of superfluous material control,” Krewell said.

Chip designers are embracing hardware-software co-optimization to break down the walls between software tasks and the hardware they run on, in an effort to gain efficiency.

“It’s popular again today due to Moore’s Law slowdown, improvements in transistor speed and efficiency, and improved software compiler technologies,” Krewell said.

Pat Gelsinger via Hot Chips Live Stream

Intel is trying to meet the insatiable computing demand for software by designing new types of chips that can scale computing in the future.

“People develop software and silicon has to come underneath,” Gelsinger said.

He added that chip designers “must also consider the makeup of the critical software components that come with it, and that combination, that co-optimization of software. [and] the hardware essentially becomes the pathway to being able to set up such complex systems.”

Gelsinger said the software indirectly defines Intel’s foundry strategy and fab capabilities to produce new types of chips that integrate multiple accelerators into a single package.

For example, Intel has implemented 47 computing tiles – also called chiplets – including GPUs inside a codenamed Ponte Vecchio, which is intended for high-performance computing applications. Intel supported the UCIe (Universal Chiplet Interconnect Express) protocol for die-to-die communication in chiplets.

“We are going to have to do joint optimizations in the hardware and software area. Also on multiple chiplets – how they play together,” Gelsinger said.

A new class of EDA tools is needed to build chips for large-scale systems, Gelsinger said.

Intel also highlighted the “software-defined, silicon-enhanced” strategy and tied it closely to its long-term strategy to become a chipmaker. The goal is to plug in middleware in the cloud that is enhanced by silicon. Intel offers subscription features to unlock middleware and silicon that increase its speed.

Software can make data center infrastructure flexible and intelligent through a new generation of smartNICs and DPUs, which are compute-intensive chips with networking and storage components.

Network hardware architecture is at an inflection point, with software-defined networking and storage functions shaping hardware design, said Jaideep Dastidar, principal researcher at AMD, who presented at the Hot Chips.

AMD talked about the 400G adaptive smartNIC, which includes software-defined cores and fixed-function logic such as ASICs to process and transfer data.

Software elements help these chips support a diverse set of workloads, including on-chip computing offloaded from processors. The software also gives these chips the flexibility to adapt to new standards and applications.

“We decided to take the traditional paradigm of hardware-software co-design and extend it to programmable hardware-software co-design,” Dastidar said.

The chip added an ASIC to the programmable logic adapters, where one can overlay customizations such as custom header extensions, or add or remove new accelerator functions. Program logic adapters – which could be FPGAs defining ASIC functions – can also perform full custom data plane offloading.

The 400G adaptive smartNIC also has programmable logic agents to interact with the embedded processing subsystem. The chip has software to program logic adapter interfaces to create cohesive IO agents to interact with on-board processor subsystems, which can be modified to run the network control plane. The software allows data plane applications to be completely executed in either the ASIC, the programmable logic, or both.

AI chip company Groq has designed an AI chip in which software takes control of the chip. The chip hands over the management of the chip to the compiler, which controls hardware functions, code execution, data movement, and other tasks.

The Tensor Streaming Processor architecture includes software control units embedded at strategic points to send instructions to hardware.

Groq uprooted conventional chip designs, re-examined hardware-software interfaces, and engineered a chip with AI-like software commands to manage chip operations. The compiler can reason about correctness and program instructions on hardware.

“We explicitly pass control to the software, specifically the compiler, so that it can reason about correctness and program instructions on hardware,” Abts said.

Groq used AI techniques – in which decisions are made based on patterns identified in data from probabilities and associations – to make decisions about hardware functionality. It’s different from conventional computing, in which decisions are made in a logical way, which can lead to waste.

“It wasn’t about abstracting the hardware details. It was about explicitly controlling the underlying hardware, and the compiler has a regular view of what the hardware is doing at any given cycle,” Abts said. .

Systems are becoming increasingly complex, with tens of thousands of CPUs, GPUs, smartNICs, FPGAs connected to heterogeneous computing environments. Each of these chips performs differently in terms of response time, latency, and variation, which could slow down large-scale applications.

“Anything that requires a coordinated effort across the whole machine will ultimately be limited by the worst-case latency on the network. What we’ve done is try to avoid some of that waste, that fraud and those abuses that happen at the system level,” Abts said. said.

Abts gave an example of a traditional RDMA request, where issuing a read to the destination usually results in a memory transaction, which then sends the response back to the network where it can be used later.

“In a much simplified version, the compiler knows the address being read. And data is simply pushed across the network when it’s needed so it can be consumed at the source. This allows for a much more efficient network transaction with fewer messages on the network and less overhead,” Abts said.

The concept of spatial awareness has come up in many presentations, which involves reducing the distance data has to travel. The proximity between chips and memory or storage was a common thread in the design of AI chips.

Groq made fine-grained changes to its basic chip design by decoupling the main computational units in processors, such as integer and vector execution cores, and grouping them into separate groups. Proximity will speed up the processing of integers or vectors, which are used for basic computing and AI tasks.

The reorganization has a lot to do with how data travels between AI processors, Abts said.


Comments are closed.