[SPONSORED CONTENT] In the automotive industry, the first movers have a huge advantage. Success in the industry is built on vehicle R&D, at the heart of which is multi-physics design, performance and crash safety software. Generating remarkably realistic simulations, these apps speed vehicles to market, allowing engineers to sample more design options and perform more virtual tests in less time.
But these remarkable simulations are computationally and data intensive, relying on high-performance computing and advanced AI technology resources. In this industry that rewards first movers, a major challenge is to avoid being a slow mover when it comes to technology. On-premises hardware lifecycles are typically three to five years, and that’s a problem in automotive, where technology ages in a matter of years. According to Microsoft, one strategy to keep the latest and greatest technology available to design engineers is to move workloads to the high-performance cloud Azure, which continually updates its platform with hardware and advanced HPC-class software.
We recently caught up with AMD’s Rick Knoechel, Senior Solutions Architect, Public Cloud, and Sean Kerr, Senior Director, Cloud Solutions Marketing. In their roles, they work closely with officials at Azure, which has proven itself in recent years as the first public cloud to adopt AMD chips, including the latest AMD EPYC processors and AMD Radeon Instinct GPUs.
According to Knoechel, automotive engineers work under a source of chronic pressure called the “stress environment.” Too often, there are insufficient compute resources available to run engineering workloads. And it’s no wonder, given that automotive design software is evolving to encompass multiple disciplines in a single application package, such as multi-physics simulations combining mechanics, aerodynamics, and thermal cooling.
The constraint environment has a significant impact on the end-user community. “These are the men and women whose job it is to validate the ‘crashworthiness’ of a new vehicle,” Knoechel said. “Let’s say you have a new electric pickup truck loaded with 2,000 pounds of batteries. You’ve added 2000 pounds of mass to the truck and now you want to make sure it meets FMVSS (Federal Motor Vehicle Safety Standards) regulations. This kind of thing means the demand for compute is at an all time high in the industry. »
Demand for HPC isn’t just high, Knoechel said, it’s erratic.
“Historically, the demand has been very ‘sharp’,” Knoechel said, “it’s not predictable…And that’s where that challenge lies. How many resources are you making available? If you’re “ overcapacity”, you are underutilized and you have wasted time, energy and money. And if you are “undercapacity”, you cannot meet end-user demand. Currently, virtually all on-premises HPC environments around the world are under capacity as there is increased demand for HPC.
He shared a customer story involving an automotive component vendor with a 1,200-core on-premises infrastructure whose productivity gradually declined due to test bottlenecks that left design engineers waiting in queues. waiting for the calculation time.
“They had established a best practice that before they could do a final validation test… they would build a physical prototype… for FMVSS certification. But the rules were that you weren’t allowed to build the prototype until you had reliable results from the simulation. So in essence, their entire product design and development pipeline had to go through 200 simulation engineers globally, and that created a bottleneck on the order of $6-7 million per year in terms of staff waiting time in the queue.
Fast forward: the company’s engineers are now doing their work on Azure and, according to Knoechel, they’re running four times as many crash simulations and four times as many software licenses with annual cloud spend of around $1 million only. And with Azure’s high availability and scalability (its “elasticity”), compute resources are provided regardless of daily changes in demand for HPC. The result, Knoechel said, “productivity was unleashed.”
Knoechel pointed to other issues for auto companies with on-site resources. One is having data center managers on staff with expertise in operating HPC-grade infrastructure – they’re not easy to hire and retain. While this expertise is commonly found among major automotive OEMs, it is a growing problem among automotive component manufacturers.
“More and more design work is being outsourced to vendors,” Knoechel said. “They need to be more innovative, they’re under pressure to deliver a better, lighter product, and time to market is a big challenge.” The result is that more automotive vendors – some with up to 5,000 cores – have migrated their infrastructures to the cloud.
Another problem for on-premises organizations: Technology industry supply chain outages are delaying delivery of advanced servers with the latest components – the latest chips, power supplies, storage and memory drives. But because Azure is one of the largest server OEM customers, it’s leading the way for new server shipments.
“Azure is very fast adopting our latest technologies,” Kerr said. “For example, they were the first in the cloud to adopt our 3rd Gen ‘Milan’ EPYC with AMD 3D-V cache, they picked up our products very well when released. On-site hardware procurement cycles can be long, they can refresh every three to five years, requiring big bets by planning teams. But with Azure, the refresh is constant, customers don’t have to make that big gamble.
Additionally, Azure’s HPC team created an on-premises type offering, reflecting the intricacies of users’ high-performance computing needs. Beyond incorporating advanced new technologies and the storage and deployment tools used in HPC, Azure also has a lightweight hypervisor for bare metal performance and was the first major cloud provider to offer the Infiniband framework, all supported by Azure’s HPC technical staff.
Last March, Microsoft announced the upgrade of Azure HBv3 virtual machines with EPYC 3rd Gen chips. He said Microsoft’s internal tests show the new HBv3 series delivers up to 80% better performance for CFD, 60% better performance for EDA RTL (right to left), 50% better performance for explicit parsing by finite elements and 19% for weather simulation. Much of this improvement is attributed to the 3D-V cache of the new EPYC processors. In practical terms, an almost 80% runtime improvement on a 140 million cell ANSYS Fluent CFD model means you can run two jobs in the time it took to do one.
Kerr and Knoechel added that an automotive OEM customer sees similar performance improvements running full-vehicle Ansys LS Dyna crash simulations with more than 20-30 million elements. These races used to take 30 hours, now they take 10.
“If you’re in charge of safety for a major global automobile,” Knoechel said, “you better pay attention to it.”