top of page
Writer's pictureRich Washburn

The GPU Wars: Can Anyone Topple NVIDIA’s CUDA Empire?


Audio cover
GPU WARS

If you've been paying attention to the AI world, you know it's practically running on NVIDIA’s GPUs—both training and inference tasks alike. So much so that NVIDIA briefly became the world’s most valuable company. But behind NVIDIA’s flashy hardware is a secret weapon: CUDA, a proprietary software ecosystem that’s built itself into a fortress. The thing is, fortresses can fall. As competition heats up, is it time for NVIDIA to start worrying about losing its crown? Let's dive in.


NVIDIA’s dominance isn't just because they make killer GPUs; it’s because they've built a nearly impenetrable software moat around CUDA (Compute Unified Device Architecture). To understand why CUDA is such a big deal, you need a bit of GPU history. Once upon a time, GPUs were solely about graphics. Shader programming made them a bit more flexible, but the true game changer came with General-Purpose GPU (GPGPU) programming.


NVIDIA was first out of the gate with CUDA—a proprietary system that gave developers deep access to the GPU’s memory and computational hierarchy, allowing for incredible optimizations. In contrast, OpenCL, an open standard supported by various manufacturers, tried to be a jack-of-all-trades, working on both CPUs and GPUs. Unfortunately, it never reached CUDA’s level of performance, mainly because it wasn’t as tightly integrated with GPU hardware.


CUDA’s success was cemented by PTX (Parallel Thread Execution), its machine code. With every new GPU generation, PTX changes, but since CUDA is so popular, everyone just rolls with it, recompiling their code for the latest version. NVIDIA has become the Apple of the GPU world—offering a slick, proprietary ecosystem that developers can’t resist. And when an entire community builds on your framework, it’s almost impossible for competitors to break in.


But cracks are starting to show. Competitors like AMD and Intel have been working tirelessly to breach CUDA’s fortress.


NVIDIA has enjoyed a near-monopoly on the deep learning GPU market, and not without some shady tactics. But competitors aren’t sitting idly by, twiddling their silicon thumbs. Companies like AMD, Intel, and even some startups are chipping away at CUDA’s dominance. So, how are they planning to bring the fight to NVIDIA? Well, there are four main strategies:


1. Hardware Compatibility: This is the easiest (and oldest) trick in the book. AMD has done it with Intel CPUs, making its chips compatible with Intel’s architecture. But doing that with NVIDIA’s GPUs? Not so easy. PTX is constantly evolving, making it almost impossible to create stable hardware that can run CUDA code out of the box.

   

2. Library Compatibility: Here, competitors would rewrite CUDA libraries and make them compatible with their hardware. It sounds simple, but CUDA’s API is huge and constantly growing, making this an uphill battle.

   

3. Binary Translation: Translate CUDA code on-the-fly to work with another GPU architecture. It’s technically possible—Apple did it with its Rosetta translation layer when switching from Intel to its own ARM-based chips—but it’s extremely difficult, especially given CUDA’s complexity.

   

4. Build a New Compiler: The hardest and riskiest option is building a brand-new compiler that mimics CUDA. This path is littered with legal landmines—just ask Google, which got sued by Oracle for a tiny snippet of Java code they allegedly copied.


Even with these strategies, no one has successfully dethroned CUDA. But that’s not stopping some companies from trying.


So, should NVIDIA be sweating bullets? The short answer: not just yet. CUDA is still the king, and the momentum is heavily in NVIDIA’s favor. But the long-term game is another story.


Enter OpenAI’s Triton. This new domain-specific language is designed to be a simpler, higher-level alternative to CUDA. What’s more, it’s built on the LLVM compiler framework, which allows Triton to bypass CUDA entirely and compile straight to PTX. Translation: Triton could make it easier for developers to switch from CUDA to another platform.


Triton isn’t the only challenger. PyTorch, one of the most popular machine learning frameworks, used to be tightly bound to CUDA. But with PyTorch 2.0, that’s starting to change. The new version rewrites much of the framework, making it easier to support non-NVIDIA GPUs. This is huge because, historically, the machine learning community has been locked into NVIDIA hardware simply because PyTorch only played nicely with CUDA. But if PyTorch starts supporting AMD or Intel GPUs seamlessly, NVIDIA could lose its stranglehold on the market.


Another intriguing development comes from Spectral Compute, a small UK company that has built a clean-room reimplementation of a CUDA compiler. It’s taken them seven years (yeah, it’s that complicated), but their compiler, called Scale, currently targets AMD GPUs, with more to follow. If this catches on, CUDA could find itself with real competition.


Even AMD is stepping up its game. Their Rock M framework is trying to be a CUDA alternative, though it’s had its share of issues. AMD recently acquired Nod.ai, a company with an optimizing compiler targeting their GPUs. With Rock M, AMD is positioning itself to take a more serious stab at CUDA’s dominance.


NVIDIA still has the best GPUs, and CUDA is still the go-to for AI and machine learning. But there’s a growing chorus of competitors looking to upset the status quo. Companies are working on alternative stacks, and new frameworks like Triton and PyTorch 2.0 are opening doors to other hardware manufacturers. If NVIDIA doesn’t keep innovating (and they will), they could find themselves priced out of the market by cheaper and more flexible alternatives. 


댓글


bottom of page