Originally published here.
Earlier this week, The Information reported that Replit, an AI coding assistant that lets you build apps with natural language, has been running at gross margins between –14% and +36% in recent months. In the traditional SaaS era, anything under 60% would be considered weak. The best companies sat comfortably in the 80% range. For AI application layer companies, high gross margins are in the 60s, and the low end is negative.
Today, as you move up the stack from the infrastructure layer, gross margins become thinner.
Infrastructure: chips / infrastructure providers (e.g., NVIDIA) // today operate at high gross margins around 75%
Model: foundation model providers // today operate at ca. 50% gross margin
Application: the varying degrees of AI wrappers companies // today mostly operate at low gross margins or are gross margin negative
Yet, today’s valuations suggest that markets are pricing in eventual profits at all three layers of the AI stack.
The bull case for AI margins
So are the founders and investors structuring rounds at +50x revenue multiples being irrational? Not necessarily. Instead, they are confident that margins will expand over time. The reason is straightforward: application-layer AI companies are paying for both cloud infrastructure and per-token inference. Many are also pricing below sustainable levels to land-grab customers. The long-term bet is that margins will improve through three levers:
A) Inference costs falling: token prices for state-of-the-art (SOTA) have been dropping faster than power or memory ever did. Companies are betting that costs will continue to come down.
B) Token-free features: Companies are working to add value without burning tokens, e.g. by building non-AI SaaS features like security, pushing margins closer to SaaS norms.
C) Shifting to cheaper models: especially for certain tasks where SOTA is unnecessary.
The devil is in the detail
On paper, it’s a reasonable thesis of how to achieve cost decreases over time. But we’re in the business of going beyond the surface, so let’s examine these a little deeper.
The first thesis, A) inference costs falling, is strongly supported by current trend lines. Token prices have dropped faster than other technology cost curves, and the bull case assumes this will continue. In that scenario, foundation model providers become increasingly interchangeable, much like cloud infrastructure today. SOTA models would be a commodity, with providers such as OpenAI, Anthropic, and Gemini competing head-to-head on price and performance.
The bear case asks more questions. Will model providers really remain interchangeable, or will they diverge into overlapping but distinct market segments? Could breakthroughs enable one player to pull ahead and compound its lead - especially if recursive improvement loops in AI research accelerate progress for the frontrunner? And if that happens, would SOTA access remain a commodity, or could the leader capture outsized returns by raising prices once competitors fall behind?
Regarding B) Token-free features, the move to provide intelligence-free Saas-features is an interesting one: Companies are trying to circumvent the very product that brings them users in the first place. On the surface it makes sense: if every user action does not require a call to an LLM, you preserve gross margin. Many companies are adding adjacent modules like collaboration dashboards, workflow orchestration, or security features that live outside the per-token billing trap. For example, v0, the AI UI builder by vercel recently launched a visual builder that allows users to make iterations and polish their products, all without burning a single token.
But there’s a paradox here. These token-free features are often not the reason customers adopt the product. The magnet for users is the AI capability. By shifting the value proposition toward non-AI functionality, companies are, in a sense, trying to monetise the least differentiated part of their offering. These features could arguably be built by any competent SaaS team and, crucially, by the foundation model companies themselves. In the mid-term, this can buy breathing room. It lets application-layer players capture revenue that is not hostage to inference costs, while keeping users inside their environment. But if you believe that AI will continue to advance and permeate every workload, this moat may be short-lived. The boundaries between “AI workloads” and “non-AI workloads” will erode. Even features that feel inherently static, like compliance checks, role-based access control, or security audits, can be automated by AI agents once they are capable enough. You are then back to either A) or C).
Regarding C) shifting to cheaper models: for a shift to cheaper models, the nuance lies in which tasks a product serves, because not all AI-powered work is created equal.
A useful distinction is between intelligence-hungry and intelligence-full workloads.
Intelligence-hungry: These are tasks where performance can always improve with more intelligence, and where users will pay for that improvement. Think creative domains: marketing copy, design, or video production. You can always write a sharper headline or produce a more engaging video, which in turn will drive increasing performance in the marketplace. As SOTA models improve, the market’s expectations rise with them. If you’re selling into an intelligence-hungry category, customers will keep wanting access to the best model, which typically commands a higher price point. We are then back to A).
Intelligence-full: These are tasks with a fixed complexity threshold. Once the AI meets that threshold, there is little to gain from further improvement. Payroll automation is a good example. Once the system pays employees accurately and on time, there is no competitive advantage in making it 10% “smarter”. In these cases, you can switch to a cheaper, non-SOTA model without compromising results. Over time, this caps willingness to pay and drives pressure to commoditise.
Many AI founders are betting their products skew intelligence-full. The logic is straightforward: once you’ve crossed the capability threshold for a fixed-complexity task, you can swap to cheaper models and improve gross margins.
The largest application players have an additional edge: the training data generated from their user base. This data can be used to fine-tune smaller, faster, cheaper models optimised for their domain. That dynamic makes the fight for market share at the application layer so intense: the prize is not just today’s revenue, but the ability to build a proprietary model that is both cheaper to run and harder to replicate. If you can lock in customers and capture their data exhaust now, you are better positioned than the rival who builds a similar product two years from now, even if they start with better off-the-shelf models. The moat is in the data.
However, this only works under the following conditions: customers face high switching costs that lock them into your product and the level of competitive intensity remains somewhat stable over time. If there are weak lock-in effects and increasing competitive pressure, the intelligence-full paradigm of “I can build/ choose cheaper models over time” does not work anymore. You might now ask: But why would competitors offer cheaper alternatives in the first place?
Neijuan (involution): competing to the bottom
This is where the Chinese concept of neijuan - involution - becomes useful. It describes a dynamic in which companies don’t price to maximise margin, but to crush competitors by offering the minimum viable margin needed to survive.
In the West, if two competitors each sell a product for $10 that costs $5 to produce, both are making a $5 margin and take 50% market share each. If one company innovates and drops its production cost from $5 to $1, the typical move is to undercut the competitor by, say, 20% - lowering the price from $10 to $8. This now yields a $7 margin, which is both a substantial improvement over the old $5 margin and a decisive price advantage over the competitor, allowing them to grow market share. In China, the same cost breakthrough might instead lead to dropping the price all the way to $2. That captures far more market share but leaves just a $1 margin and prioritising competitive dominance over profit maximisation. “The harder you work, the less you gain” is how the South China Morning Post put it recently.
If AI software pricing follows a neijuan path, the industry could end up like the Chinese stock market, which has moved sideways for two decades despite GDP growth far outpacing the West.
Why AI may be prone to involution
Some AI founders and investors believe they’re following the Uber playbook: subsidise prices with VC dollars, dominate the market, then either raise prices later and/or bank on falling costs. We just covered the cost-side of this argument. On the price side, raising prices works when you have structural moats: network effects, entrenched logistics - anything that causes high switching costs.
AI may be different. The models are as bad as they will ever be today. They will only get better, potentially cheaper, and easier to use. The barrier to entry for competitors falls over time, not rises. In many industries, technological innovation reduces competition because it requires scarce skills, or capital and can sometimes be protected through patents. In AI, especially through advancements in code generation, improvements increase competition by making powerful software generation capabilities available to more people without specialised expertise. This is crowding-in of competition, not crowding-out.
The myth (so far) of pricing to replace labour
The counter to much of this discussion is that AI is not simply displacing legacy software; it is automating a much larger market: human labour. The logic is that because of this, AI companies can charge much higher prices. Even when framed as labour automation rather than software replacement, however, I rarely see AI companies pricing their products accordingly. If your tool replaces five $70k employees, in theory you could charge $350k/year minus the value you want the customer to retain. In practice, competitive pressure drives prices far below the value of the labour replaced. As noted at the start of this article, many companies are not even pricing at their cost to serve. Instead, their price points are often one to two orders of magnitude below what full labour-replacement pricing would imply.
Part of the explanation is that few, if any, AI products today enable complete automation of labour - the technology simply isn’t there yet. As agentic AI systems become more capable, we may see prices move upward, but whether that happens will depend on how competitive intensity and switching costs develop along the way.
The looming erosion of lock-in
A final counter is that lock in may no longer be all it once was. Enterprise software used to have strong lock-in - switching to other solutions was difficult, often seen as impossible. Ask any large SAP customer about switching to a different solution and they stare at you like you just spoke an alien language. Microsoft’s lock-in is so fantastic, they have gotten away with building awful products for decades now, while still steadily growing their enterprise business every year. However, we are seeing the first small signs that AI could reverse this decade-old trend. With model-agnostic architectures and agents dynamically routing workloads, switching costs fall. Some enterprises are already increasingly using foundation models to swap out legacy vendors.
Without significant switching costs, raising prices after winning market share becomes far harder. Uber could pull it off because once they had the network of drivers and riders and crushed Lyft, it controlled almost the entire market. In AI, by contrast, real network effects are rare. Without them, swapping vendors could become easier, not harder, over time.
A new normal for gross margins?
If competitive intensity rises faster than inference costs fall, and if better technology keeps lowering the barrier to entry, the gross-margin recovery many investors expect may never happen. The AI application layer could be locked in perpetual price wars, more crowded than the SaaS era, and with lower switching costs. Tomorrow’s application layer software businesses in the West may look more like today’s consumer businesses look in China - heavily competitive, operating at razor thin margins.
That doesn’t mean there won’t be winners. It means that if you are underwriting investments on the assumption that margins will “naturally” increase to 80% over time, you may be playing the wrong game.
The strategic question is not whether AI will transform industries (it will) but whether any application-layer AI company can build a moat strong enough to resist neijuan. Without it, better technology won’t mean better margins. It will mean more competitors.
Thank you to Angelica Oung for first making me aware of the concept of neijuan.


