The goal of Nvidia, as stated, is to produce graphics card that provide the best performance, such that is is not even economical to run a competitor’s cards. Let’s delve into this. First we need to understand marginal cost. If you build a machine to make widgets, and let’s say you buy the machine up front, the cost of the machine is a sunk cost. If you make one widget or a million, it doesn’t matter. Your sunk costs are your sunk costs.
If it takes $10 with of labor and inputs to make 1 widget, my marginal cost per widget is $10. Anything I earn over $10 per widget covers at least my marginal costs. I may not be profitable, but I’m making money on every widget and cash is coming in. If I make less than $10 on a widget, I’m losing money every widget and bleeding cash. Based on the market, I expect to sell 100,000 in a typical year. A price of $15 would cover both my marginal costs and the amortization of my sunk costs. That’s the best of all possible worlds.
What did I mean by amortization? I expect my widget machine to last, say, 10 years. I don’t just assign the cost of the machine to the first year, as it will produce revenue for the next 10 years. It distorts my performance by making my first year look like a legendary disaster, but the next 9 years like I’m a genius. I take 1/10 of the machine cost every year in amortization. If the machine was 5,000,000, that’s 500,000 a year. If I sell 100,000 widgets, I can assign $5 to each widget as its portion of the sunk cost. If I sell 100,000 widgets for $16, I cover both my sunk and marginal costs and $1 of profit per widget. If I sell 100,000 widgets for $14, I need to sell more than 100,000 to cover amortization.
But let’s say the widget business is not so good. Someone offers me $12 for 50,000 widgets. If I have no other orders for widgets, I’m going to lose money, given the amortization of my machine. Specifically, I won’t be able to cover 400,000 of the amortized cost. However, because my marginal cost is only $10, I get $100,000 of contribution to my sunk costs that I would not have received, had I not taken the deal. I could even make money if I got enough contracts at $12. If the offer is for 200,000 widgets at $9, I should not take the deal. Even if it is more money, I’m losing $1 per widget in marginal costs. I am losing more money on that deal than if I just shut down the machine. There is no sales volume that would result in anything but losing more money.
If my competitor buys a newer widget machine, that’s more efficient, and can make widgets at $8 a widget, they can undercut me at a $9 price point. They’re still making a contribution to their fixed costs (since the marginal cost is $1 less than the selling price). But I can’t match their price without bleeding money. I have no choice, at that point, but to re-tool and try to get my cost down below $8 to be competitive. Otherwise, they take market share and drive me out of business.
The cost of building the data center and filling it with GPUs is the sunk cost. Next, you have marginal cost which is largely electricity. It’s used to both power the devices and cool the devices. Then you have people to manage the data center. Then you have software costs related to that tuning or managing the model. The goal of Nvidia is to make it so that the marginal costs of operating a competitor’s chips is higher than the Nvidia chip. They want to produce new chips at a 1 year cadence, and within 2 years make the marginal cost of operating the newer chips decisively more attractive that the old chips. They want the economics to drive customers to replace on a 2-3 year cycle, to minimize their marginal costs.
One myth is you only need the latest and greatest chips for training, but when they’re two years old, they can move to inference. Training is when the model is sent data to get it to make a good response. Inference is when you and I ask the model for stuff. It’s estimated that 90% of the costs are for inference. That means the least efficient chips would be used for 90% of the cost. Moreover, as the models come on line, they are larger and push the limits of the older chips, meaning those chips have to work even harder for one unit of output. There may be some models that require too much memory or more bandwidth than the older chips provide. Using a 5 year old chip that’s struggling on a newer model is the highest possible marginal cost you can have. Granted, no one can have all new chips all the time, so you have a blend of generations with some chips being new and some chips older. You can track a blended marginal cost, and let’s say that’s $10 per 1,000 queries. [Don’t worry about the exact numbers – I’m making the math easy.]
What Nvidia is trying to do is to make chips so the AMD customer might by a cheaper chip, but their marginal costs are higher than Nvidia’s offering. For example, the AMD chip is a $12 per 1,000 queries, while Nvidia wants to be $10. But next year AMD will come out with a better chip, maybe meeting Nvidia’s $10 per 1,000 queries. So Nvidia has to come out with a chip that produces 1,000 queries for $8. And the same again next year. The two year old Nvidia chip still costs you $10 per 1,000 queries, but the new one costs you $6 per 1,000 queries. To stay ahead of your competition, you need to retire the 2 year old chips as fast as possible and buy as many of the new chips to keep your blended cost lower than your competitor’s blended cost.
But even the models themselves have a short life. Every few months a newer or improved model comes out. Most users want to use the latest model, even if it’s larger. The old model, which might run well on the old hardware, begins to see its market shrink. It may still have applications that are tied to it, but not new applications and not new users. The realistic lifecycle of a model might only be a couple of years. The chips are constantly being pushed to larger and larger models. To get better performance more and more “think time” is built into how the model is executed. This makes new models on old chips not just a little worse, but even harder because the model has to be run with more iterations to provide better output. It’s like having a truck that burns more gas than the new model, but you also have to make more trips with it.
Right now we don’t have a price war. We have a war of financing. Investors want to see growth. More than undercutting each other on cost, the goal is to be able to grow your user base with free services. If that’s the case, using your highest marginal cost chip, to offer free services, seems like putting a noose around your corporate neck. All your competitor has to do is pay more in sunk costs that have to be figured out some point in the future (buy more new GPUs) and kill you on marginal costs today. This won’t work forever, but it works in the short run.
The goal is to be the last one standing in a winner take all market. If your competitor is only bleeding $8 per 1,000 queries on free customers to your $10 per 1,000 queries, they can either offer better models or higher limits. They don’t have to be profitable. No one is profitable right now. No one is meeting their marginal costs. But if you can operate more efficiently, you have more time until the corporate grim reaper comes for you. Your runway lasts longer. What about the GPUs your investors financed? Don’t they want a return on their investment? Yes, that comes from you making it to the finish line while your competitors do not, rather than getting to positive cash flow.
So let’s think about this, again, for a 4-6 year lifetime for a GPU. I would buy 4 years, as that’s probably where the chip is clearly on its last legs, but at 3 years it may still be effective to run in a blended cost environment. That 4th year is likely when it really starts becoming an economic millstone around your neck. I think 6 years is absolute fiction, unless you have government and large corporate customers who have a fixed (and profitable) contract and don’t want you to change the model used by their applications. For the part of the market, like developer subscriptions, or chat agents, or new development, which live on the latest models, a 6 year old chip is likely killing you. It raises your blended costs and you burn through your investment even faster. So no, I don’t think a 6 year life span makes any sense whatsoever, except for a narrow type of government or large corporate customer. I think a 2 year lifespan makes a lot of sense for a very aggressive AI provider, who wants to keep their marginal costs as low as possible.









