No time to read the update? Witch/listen for free on Spotify and Youtube:
Edited by Brian Birnbaum.
1.0 A Step Forward in the NVIDIA-AMD AI Race
The architecture of Nvidia’s Blackwell further evidences AMD’s structural advantage on the hardware side.
Having reviewed Nvidia’s and AMD’s respective new product lineups in depth, I still believe Nvidia’s dominance in the AI space is currently being disrupted by AMD. Further, my long term AMD thesis does not depend on the successful disruption of Nvidia, since AMD has drawn a highly differentiated roadmap.
But the asymmetry in case of success is considerable, and that’s why it’s worth exploring.
I’ve explained in the past that, as we move towards smaller process nodes, the complexity of producing monolithic chips (Nvidia’s focus) is increasing exponentially. On the other hand, chiplet architecture (AMD’s focus) is considerably less complex, assuming one has the necessary expertise.
Less complexity means higher yields and therefore lower costs, which makes it likely that over a number of iterations, AMD’s GPUs will eventually achieve a competitive price to performance ratio. So long as Nvidia remains set on the monolithic path, AMD is bound to catch up on the hardware side.
It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected. The availability of large functions, combined with functional design and construction, should allow the manufacturer of large systems to design and construct a considerable variety of equipment both rapidly and economically.
– Gordon E. Moore in his “The Future of Integrated Electronics” paper.
Following the Nvidia GTC conference in March, the word on the street was that Nvidia had finally pivoted to chiplets, thus mitigating the risk of disruption. However, an in depth review of Blackwell’s architecture reveals that the chip essentially consists of two large chips connected to each other.
Previous Nvidia architectures were fundamentally similar. But for the first time, Blackwell’s two chips now act as one at the software and network level. As such, Blackwell is technically made of two chiplets and thus represents a tentative first step towards chiplet architecture.
In this sense, Blackwell is a provisional confirmation of my original observation: that Nvidia was going to have to pivot towards chiplets at some point.
However, each of these two chiplets are as large as they can be, right at the limits of reticle size. Nvidia is therefore already brushing the physical limit that makes producing monolithic chips exponentially harder as we move towards smaller process nodes.
To add more computational power from here, Nvidia faces either skyrocketing complexity within each of the two existing dies or the prospect of adding subsequent monolithic chips to its Blackwell architecture.
Therefore, if Nvidia does not fully pivot to AMD’s turf, over the long term one of two things can happen:
Nvidia stays ahead simply by connecting more monolithic chips (each at the limit of reticle size) as if they were chiplets, with AMD staying a marginal player.
The above approach doesn’t scale so well, relative to AMD’s ‘pure’ chiplet architecture, with AMD gaining considerable market share even at the highest end.
Note about the reticle limit: you can only make chips so big, which means that you have to cram everything into a certain area. To cram more compute in, you have to build smaller circuits. The smaller circuits get, the more complex the design and manufacturing process gets.
A review of AMD’s CDNA3 architecture (that powers the MI300 family) reveals that, as anticipated, it is highly scalable, and each component (chiplet) is far away from the reticle limit. AMD therefore does not have to push the limits of physics to continue making higher performing GPUs from here. Rather, AMD just needs to carry on adding more chiplets–a skill it’s been honing for a decade now.
From a low level perspective, connecting monolithic chips as if they were chiplets is bound to deliver much lower yields than AMD’s approach. In the former case, if one component goes wrong you have to throw away a highly costly marvel of modern engineering. In the latter approach, throwing away a tiny chiplet won’t cost as much.
If Nvidia does end up with meaningfully lower yields than AMD, this does not automatically mean that it won’t produce the highest performing products. Yield is a metric that concerns that manufacturing process and does not say anything about the performance of the chip in question.
What is clear to me, however, is that AMD has a structural advantage to bring AI compute engines to the market with a differentiated price to performance ratio.
Personalization, one of the core tailwinds behind AMD’s aforementioned roadmap, adds a component to the hardware side of the disruption thesis. By pursuing a ‘pure’ chiplet architecture, AMD can combine compute engines at will.
The MI300A is an example of this–it’s simply the MI300X minus a few GPU tiles, with some additional CPU tiles in their place. It allows AMD to unlock new applications for the MI300 platform at a marginal cost, opening up less contested distribution channels for its core AI technology.
In this particular case, the MI300A is an APU (Accelerated Processing Unit), which is designed to handle both general processing and graphics processing. This makes APUs ideal for systems in which space and power consumption are a concern, including laptops, entry-level desktops, and small form factor devices.
In the Q4 2023 earnings call, AMD CEO Lisa Su made particular emphasis on AI PCs. The marginal difference between the MI300A and the MI300X sheds some light on how AMD is planning to go about these novel PCs.
Nvidia management also emphasized AI PCs during their Q4 2023 call. But the same logic regarding yields and personalization applies to this domain. AMD has the structural advantage to repurpose its core AI technology across the spectrum of computing devices.
2.0 Nvidia’s Moat Isn’t Getting Weaker
Even though AMD has a relatively clear path ahead on the hardware side, Nvidia’s software and networking operations continue to get stronger. However, the shift towards open-sourcing ROCm bodes well for AMD.
The compute engines need a specific set of software and networking components to be useful at all in practice. Even though AMD may deliver compute engines with a better price to performance ratio at some point, without a vertically integrated infrastructure like Nvidia’s, the actual cost of ownership may be much higher.
Nvidia has a remarkable software advantage with CUDA, the framework that seamlessly allows developers to interact with Nvidia GPUs. However, AMD has been quietly funding open source operations to get its software up to scratch and now is officially open sourcing parts of its ROCm software.
AMD’s decision to open source ROCm came from the controversy generated on X after Tiny Grad’s George Hotz complained about the unstable driver (software) support. This includes the release of a feature called “fuzzyHSA” that allows AMD developers to get real-time feedback from users/open source developers.
Although most of this activity is scarcely visible to outside observers, over the long term the open source approach combined with AMD’s budding software capabilities (via the Pensando acquisition, primarily) has a real chance of competing with CUDA–if the hardware progresses as I expect.
Fundamentally speaking, the software dilemma is now AMD and the open source community versus Nvidia. This considerably increases the odds of AMD’s structural differentiation on the hardware side bearing fruit down the line, since the world naturally wants to avoid being locked into a single vendor.
In the Q4 2023 earnings call, Nvidia CEO Jensen Huang mentioned how Nvidia has been developing software for specific verticals for over a decade now: finance, healthcare, biology and more. This complicates things further for AMD. It takes a long time to get all the little details right to serve customers in such critical verticals.
In turn, Nvidia’s networking business remains head and shoulders above AMD’s. Its Networking revenue run rate exceeded $13B in Q4 2023. This is up from $10B last quarter, which makes for spectacular growth quarter over quarter.
As I’ve explained in the past, when it comes to AI, moving data around is as important as the actual compute. A compute engine without the adequate infrastructure to move data around is relatively useless in practice. A booming networking business therefore gives Nvidia a considerable advantage, much like its software endeavors.
3.0 Financials
The balance sheet is very strong, with the auto and pro visualization segments set to meaningfully contribute to Nvidia’s top and bottom line over the coming decade.
Income Statement
Meanwhile, Nvidia’s revenue and operating margins continue pushing through all time highs. OpEx as % of revenue has declined meaningfully since late 2021, with the company getting much leaner.
Gross margins also continue pushing through all time highs, albeit driven this quarter by “favorable component costs.” Management expects this tailwind to continue in Q1 2024 before subsiding.
In turn, Pro visualization and Auto segments are highly strategic. Although data center is driving much of the stellar financial performance at present, auto and pro visualization are positioned to grow meaningfully as digital twins become the norm and the software-defined car comes into play.
Incidentally, Nvidia's Pro Visualization segment focuses on providing high-performance graphics hardware and software used primarily for professional design, visual effects, and real-time simulation tasks in industries like architecture, engineering, and entertainment.
The Auto segment of Nvidia targets the automotive industry, offering solutions that power autonomous driving systems, in-car infotainment, and AI-based functionalities to enhance vehicle capabilities and user experience.
The visualization of digital twins makes them more accessible and thus effective across an organization. In turn, while the world quarrels over whether electric vehicles are the future or not, what is relatively certain is that they will be smartphones on wheels.
Nvidia is well positioned to capitalize on these two trends.
Further, in the Q4 2023 earnings call Jensen talked about Nvidia Enterprise, which will allegedly manage clients’ entire computing stack for $4,500 per GPU. This strengthens the moat by fully abstracting away complexity for clients and thus have them lean more on Nvidia’s vertically integrated infrastructure.
It will only get harder to lure customers away from Nvidia. As volume increases, Nvidia should be able to lower the cost to serve over time. Emulating Nvidia’s vertical integration will become even harder with time–straight from the playbooks of Amazon and Tesla.
Cash Flow Statement
A picture is worth 1000 words–so long as Nvidia continues to produce the world’s leading GPUs, together with its undisputed software and networking ecosystems, the cash flow will likely continue increasing.
Balance Sheet
At the end of Q4 2023, Nvidia had $25.9B in cash and equivalents and just $8.45B in long term debt and $1.25B in short term debt.
4.0 Conclusion
Analyzing the potential disruption of Nvidia’s dominance yields insights that would otherwise be harder to visualize. It sharpens my understanding of both Nvidia’s and AMD’s product roadmaps and, as such, I consider it a productive activity on its own.
Having said that, if my understanding of the monolithic versus chiplet dilemma is correct, I believe that AMD is positioned to cause much more trouble than it’s given credit for. However, this very much depends on whether AMD’s open source software bears fruit, which I will be tracking closely.
Meanwhile, as the world’s demand for computing power continues to grow exponentially, I believe that both companies are set to do well over the long term. It is very likely that Nvidia finds a way to continue scaling its monolithic efforts from here and that AMD finds a recipe that works too.
I believe this space won’t evolve into a winner-takes-all scenario, because the market wants redundancy. Relying on a sole provider for compute engines is a severe existential risk.
Until next time!
⚡ If you enjoyed the post, please feel free to share with friends, drop a like and leave me a comment.
You can also reach me at:
Twitter: @alc2022
LinkedIn: antoniolinaresc