Sunday, March 22, 2026

A God Model - The False Promise of Infinite Intelligence

The Core Flaw

Languages exist because we mean things. They are beautiful, ground-up architectures of our shared meanings. Before they ever became law or literature, philosophy or code, they began as intentions. Those intentions were then made audible—into warnings, pleas, promises, commands, confessions. We developed speech to carry a mental state from one mind to another—to say, this is what I see, this is what I know, this is what I fear, this is what I feel.

Over centuries, we poured our intentions into stories, letters, speeches, arguments, scriptures, poems, blogs, tweets and comments. That archive is not random verbal debris. It is saturated with order, habit, structure, and meaning because it was produced by human minds trying to express something.

Engineers eventually discovered that this immense record of our expression could be analysed for patterns. Words cluster, phrases recur, and larger structures can be mapped statistically. From that insight, they built neural networks—massive computer programs designed to map and calculate these patterns. The most famous of these architectures is the Transformer, the “T” in GPT. It is a statistical model built specifically to calculate which token is most likely to come next in a sequence.

At first, the mechanism appeared in a modest and almost invisible form: the smartphone keyboard. We typed, “I am going,” and the machine suggested “home” or “to,” predicting the familiar phrases “I am going home” or “I am going to work.” The tool was useful, but its limits were obvious. The software offered probability; we supplied the intention. We could ignore the suggestion entirely and write, “I am going insane.” The meaning still belonged to the mind using the tool.

What changed in this latest wave of innovation was not the basic principle but the scale, speed, and automation. Companies trained these models—Large Language Models (LLMs)—on every digitized trace of the human mind they could grab. Once the models could calculate the mathematical probability of every phrase—in any context—they removed us from the act of choosing the next word.

Now, the model automatically picks the next word with the highest probability and adds it to the sequence. It then feeds this new, longer string back into itself to calculate the next word, and then the next. It repeats this operation, generating sentence after sentence without ever needing a human intervention.

This is generative AI in a nutshell. The engineering required to sustain this loop is, without question, staggeringly complex. To calculate the probability of a single word, the machine must haul billions—sometimes trillions—of parameters into memory, executing complex mathematical operations in fractions of a second. But we must not confuse computational density with cognitive depth. Beneath the blinding complexity of the math, the core mission is entirely mechanical. In one sense, the achievement is undeniably astonishing. The outputs can be fluent, persuasive, and at times eerily reflective. But the underlying mechanism remains what it was: prediction operating at scale.

And that distinction matters. Prediction is not purpose. Fluency is not understanding. Syntax is not consciousness. The machine does not stand anywhere in our world. It possesses no underlying model of physical reality. It does not experience the irreversible flow of time or the stubborn weight of gravity. It has no body at risk, no private memory, no desire, no shame, no grief, no moral stake in truth or falsehood. It does not know what words cost us. It has never had to break a promise, confess a guilt or beg for forgiveness. It does not mean. It assembles.

Its power comes entirely from an inheritance it did not create: the accumulated weight of human history. It is the testament of mortals who have loved, feared, worshipped, lied, suffered, ruled, resisted, and tried to make themselves understood. From that inheritance, it can produce an imitation so convincing that we begin to mistake the echo for the voice.

That is the defining fact we must keep in view. Generative AI is not a new form of intention entering the world. It is an automated system for recombining the traces of our intentions left behind. However dazzling the performance, the source of meaning remains where it has always been: between the silence of intent and word.

The Mathematical Certainty of Hallucinations

This mechanical reality reveals why the industry’s greatest technical hurdle cannot be solved. What the industry sells us as intelligence is an exquisite machinery of approximation—a velvet-wrapped guess. Because the model is calculating the statistical likelihood of the next word, it possesses no mechanism to verify the truth of the sentence it is building. It does not know facts; it only knows correlations.

Sometimes, this guessing game lands on the truth. Ask it the capital of France and it produces “Paris,” obedient as a well-trained servant of repetition. But ask it to walk into a thicket—a forgotten legal doctrine, an obscure C++ dependency, a situation too new or too complex to have hardened into statistical pattern—and the model will still do what it always does. It manufactures. Not knowledge. Not reason. Merely plausibility. To the machine, generating a fact and generating a fiction are the same mathematical operation: prediction.

The priesthood of Silicon Valley brand these fabrications “hallucinations,” pitching them to corporate boards as temporary bugs, a regrettable stumble on the way to paradise. They promise to iron out the defect with more training data and better algorithms in the next quarterly update. Salvation is always one upgrade away.

But this is a mathematical impossibility. A hallucination is not a bug in a probabilistic engine; it is the foundation stone. A system built on probabilities will sometimes guess wrong. Its relationship to reality is incidental, not intrinsic.

To mask this reality, the industry is now pitching “AI Agents”—systems where one model is deployed to proofread the output of another. But this is a mathematical trick. You cannot stack layers of probabilities to create certainty. Forcing a model to check its own work requires a second, third, or fourth full pass to check, correct and verify. The vendor is increasing the computational cost for each prompt by three to four times to deliver a less flawed result.

Even with this expensive, multi-layered proofreading, the error rates cannot be reduced to zero. And this brings the industry face-to-face with the Law of Large Numbers. In probability theory, this law dictates a simple, brutal truth: if an outcome has a non-zero chance of happening, repeating the action an astronomical number of times guarantees its arrival.

The insurance industry has built empires on this exact premise. To a single person, a catastrophic accident is a rare, unpredictable tragedy—a once-in-a-lifetime anomaly. But to an actuary looking at a population of a hundred million, that same accident is not an anomaly; it is a mathematical certainty. The insurance company does not guess whether crashes will happen. They know exactly how many will happen, and they price their premiums against that undeniable pattern.

Silicon Valley is treating hallucinations like the individual driver, hoping to avoid the anomaly with more training data and proofreading. But they are deploying these models at the scale of the actuary. If an LLM has even a 0.1% chance of silently fabricating a critical strategy for pension fund investment, testing it on a few scenarios might look flawless. But when an enterprise lets that system loose to manage billions in pension funds across millions of automated trades, probability will cash its cheque.

Even when the industry attempts to bolt these models to factual databases, the underlying physics remain unchanged. The machine does not ‘read’ the database; it merely uses the retrieved text to calculate a new statistical output. The final assembly is still probabilistic. You cannot compute your way out of this condition without changing the underlying physics of the machine. You can train it to become a virtuoso of sounding right. But unless you build a deterministic logic engine that can reason from the truth of the real world—you cannot abolish the hallucination.

The Death of the “Double Thank You”

A healthy market rests on a simple, invisible foundation: the “Double Thank You.” You buy a cup of coffee. You thank the barista for the drink, and the barista thanks you for the money. It is a ritual of mutual gain. Neither has been forced. Both walk away feeling they have gained something of value.

For over a century, the modern workplace borrowed this same contract. The worker gave labour; the company gave wages. When a new tool arrived, it was introduced as an ally. The spreadsheet did not arrive as a rival. The compiler did not sit at the engineer’s desk like a silent replacement waiting to claim their chair. The technology was a lever. It made the worker more productive, which increased the company’s output, which justified the wages.

Generative AI shatters this contract. Its first principle is not exchange, but appropriation.

These AI companies built their multi-billion-dollar valuations on scraping the collective intellectual output of humanity—novels, essays, codebases, grief, jokes, instructions, and memories—without consent, without payment, without even the decency of acknowledgment. They looted a planetary library, pulped it into statistical paste, and sold it back to the world as a miracle. There is no “thank you” to the original creators whose intent was harvested to train the engine.

But the extraction does not stop at the edge of the internet. When this technology enters the enterprise, the theft becomes intimate.

The Corporate Angle

The destruction of this social contract begins at the executive level, driven not by strategy, but by panic.

In the past, enterprise technology arrived wearing the plain clothes of necessity. A company bought a relational database because it needed to retrieve millions of records instantly. It rented cloud infrastructure to survive unpredictable spikes in web traffic. The logic was embarrassingly straightforward. The need came first. The tool followed. The return on investment could be measured.

The rush to adopt generative AI is different. Companies are not spending on LLM contracts because they have found a precise problem that this kind of system can solve. They are spending because they fear being left behind—the Fear Of Missing Out (FOMO).

The sales pitch delivered to corporate boards is not a promise of steady efficiency; it is an existential threat. Integrate now, or die. Deploy now, or be destroyed. AI is presented less as a useful tool than as a race no one can refuse to run.

What follows is inevitable. Executives authorize sweeping capital expenditures to assure nervous shareholders that the company has boarded the departing train to the future. They are stapling this technology onto customer service, legal workflows, and software pipelines. In many cases, this happens before anyone has said what problem the system is meant to solve, or whether the system is fit for the job at all. The logic is theatrical: We have to do something, this is something, so let’s do this.

What they are buying, at staggering cost, is not certainty. It is the performance of certainty. They are paying a premium for a mirage, shimmering in the heat of collective corporate terror.

The First-Mover Illusion

The evangelists of the AI gold rush point to the balance sheets of early adopters as proof of the technology’s revolutionary value. Data from 2025 and early 2026 reveals specific professional services—particularly law firms and digital marketing agencies—reporting massive spikes in profitability. A task that once took a junior lawyer sixteen hours, such as drafting an initial response to a complaint, an LLM can now execute in under four minutes.

The firm delivers the document and collects the fee. This looks like a revolution. It is not. It is a temporary advantage.

The firm is not experiencing a paradigm shift in the value for their work. They are exploiting an information asymmetry. The firm buys speed from an algorithm and resells it as expertise to a client who is still paying for the many hours of human effort. The miracle is not intelligence. The miracle is that the customer has not yet observed.

In economics, this is called arbitrage. It is the simple harvesting of a temporary gap between what something costs to produce and what people can still be persuaded to pay for it.

At some point, the arbitrage windows will always close.

The Red Queen Effect & Arbitrage Collapse

The closure of this window is inevitable. Corporate clients are not sentimental creatures. They do not pay for romance, pedigree, or the antique theatre of professional mystique. The moment a general counsel or a marketing director realizes that a comprehensive draft takes ten minutes—four minutes of machine and six minutes of human review—rather than two days of traditional labour, they will refuse to pay the legacy premium. They will ask, why they are still being charged yesterday’s prices. The rates will drop. The unprecedented profit margin will vanish.

This triggers a dynamic that called the Red Queen Effect. In Lewis Carroll’s Through the Looking-Glass, the Red Queen tells Alice that in her kingdom, “it takes all the running you can do, to keep in the same place.”

As generative AI spreads, it stops being an advantage and becomes basic infrastructure. It ceases to be a unique profit multiplier and degrades into a compulsory entry ticket. If every law firm, software company, and design agency uses the same machine, then none of them stands out.

The corporation is left in a trap of its own making. Here lies the bitter irony of the AI prophecy: the human workforce is indeed replaced, but the promised windfall never arrives. As margins collapse, the firm must chase volume to compensate. If drafting a complaint response now bills for ten minutes of combined effort instead of sixteen hours of traditional labour, the firm must find nearly a hundred new clients just to replace the lost revenue of a single case. But it immediately hits a mathematical wall: Total Addressable Market (TAM). A firm can increase throughput, but it cannot manufacture demand. A doctor does not gain new patients because her charting is automated. A law firm cannot conjure new corporate disputes out of thin air. When the client base refuses to quintuple, the firm is forced to cannibalize itself. It must purge its junior staff and pour out a flood of synthetic boilerplate just to stay alive.

Eventually, the weakest firms will die. The survivors will scavenge their orphaned clients, consolidating the market into an oligopoly—a handful of exhausted giants. But this victory is hollow. Once the rates have dropped, they do not recover. The client, having tasted the cheap speed of the machine, will never again pay for the illusion of human labour. The surviving giants are left to rule over a permanently devalued market, processing mountains of cheap work just to maintain their sprawling, low-margin empires.

The AI has not made the firm richer in any structural sense. It has rewritten the terms of survival. The entire industry must now sprint at machine speed to preserve what it once earned by walking. All this innovation, all this worship of disruption, only to arrive exactly where it began—breathless, desperate, and still in the same place.

When the reality of this oligopoly sets in, the defenders of the boom point to the human in the chair. If everyone has the same AI, they argue, the differentiator will be the brilliance of the human directing it. They tell us not to worry, because we have survived these transitions before.

The “MS Office” Fallacy

Tech optimists admit the Red Queen Effect, but they treat it as nothing new. They say every foundational tool has ended up the same way. Every major technological leap—the spreadsheet, the internet, the word processor—eventually flattened into mandatory “basic equipment” without destroying the economy. In this view, generative AI is simply the next MS Office: a baseline utility that every business must adopt to participate in the modern market.

But this comparison misrepresents the nature of the machine.

When Steve Jobs introduced the personal computer, he famously described it as a “bicycle for the mind”. A bicycle is a mechanical multiplier of human effort, but it requires the rider to balance, steer, and pedal. Microsoft Excel operates on the same principle. It is a deterministic tool. If an analyst builds a complex financial model, the software executes the arithmetic instantly and flawlessly, but the analyst must construct the logic. The human retains the complete mental map of the architecture. The tool amplifies the worker’s competence without replacing their reasoning.

Generative AI is not a bicycle. It’s a taxi.

You do not pedal a taxi. You name a destination and the machine navigates the route. Because it is the AI that generates the intermediate steps, the distance between intention and execution is no longer crossed by human thought. The human is no longer the architect; they are the reviewer.

This creates a severe divergence in economic outcomes. When deterministic tools like MS Office became universal, the baseline of human capability across the economy increased. But when an autonomous model becomes universal, the baseline of competence degrades. The worker is incentivized to offload their reasoning to a machine. So, AI is not the next spreadsheet.

Cognitive Atrophy

Constructing a complex system forces the human brain to build a deep, structural mental map—whether a large codebase or a nuanced litigation strategy. To build something is to enter into an intimate struggle with it. Every difficult choice, every dead end, every ugly compromise, and every hidden patch is recorded in the maker’s mind. Knowledge is not a list of conclusions; it is a lived geography. The architect understands not just what the system does, but why every specific decision is made. They know which pillars are ornamental and which ones are holding up the roof. A person who builds the thing can move through its corridors in the dark.

This intimacy with complexity is the true capital of a knowledge worker; it is what allows them to instantly diagnose a failure or pivot a strategy.

Gen AI bypasses this intimacy. It shifts the worker’s role from architect to reviewer. The machine instantly generates the architecture, offering the finished palace without the years of hauling the rock. The human is demoted from architect to bystander, rubber-stamping a design they did not create. Their job is no longer to know, but to glance. To confirm the syntax looks correct, and to approve.

But reviewing is a passive cognitive act. Recognition is not understanding. To nod along with a machine-generated answer is not the same as having fought your way through it. The reviewer sees the surface, not the frame underneath. The mind does not build durable knowledge by skimming surfaces; it builds it through friction, error, repetition, and failure—the slow humiliations by which real comprehension is earned.

And the worker will not only seduced into skipping this struggle; they will be cornered into it by the corporation’s new math. When a tool can generate a document in four minutes, management will demand fifty documents a day. The worker physically no longer has the time to engage in the slow friction of learning. The business model forces them to skim.

Over months and years, this reliance will induce cognitive atrophy—a slow withering of the mental muscle required to do the work. The worker will retain a surface-level familiarity with the output, but they will lose their grasp of the underlying foundation. They can still approve work, but they can no longer own it.

In its blind rush to streamline operations, the modern corporations will continue to pay premium salaries for senior titles and impressive credentials. The enterprise still needs a human ‘expert’ to rubber-stamp the machine’s output to satisfy compliance, clients, and liability. But they will be paying for an illusion because all human experts have been methodically deskilled by the very tools meant to “augment” them. Nobody will notice the loss until the day something ruptures. What corporations are surrendering is not just process, but memory. Not just labour, but the dense internal architecture of thought itself.

The true cost of this atrophy will be revealed when the system fails. When a human makes a mistake, another human can often find it at once, because the purpose of the work is understood within the same team. But when a model produces a flawless falsehood, that deductive process collapses. The reviewer will be left to work backward from the result and guess how it could have been produced. Companies will use systems and make decisions that no human being will be able to fully explain or defend.

What will remain in the office towers and glass campuses will be a strange kind of expertise: highly paid, fluent, efficient, and helpless. A class of custodians trapped inside systems they can no longer read, praying to the machines they were hired to command.

The Liability Vacuum

A synthetic lie costs more than engineering time. When this engine is deployed in high-stakes environments—drafting corporate mergers, generating medical compliance reports, or managing automated trading algorithms—a silent error does not just crash a server. It triggers a lawsuit.

At this point, where the damage becomes undeniable, the enterprise crashes into a liability vacuum.

In traditional enterprise software, accountability is clear. If a vendor’s database corrupts a client’s financial records due to a structural flaw, the vendor faces severe legal and financial repercussions. Service Level Agreements (SLAs) guarantee deterministic performance, operating on the foundational understanding that power without accountability is unacceptable.

Generative AI vendors operate under a fundamentally different legal shield. Because they know that false outputs are a mathematical certainty, their contracts explicitly deny responsibility for what the system produces. They lease the engine, but they accept absolutely zero liability for the code, text, or ruin it generates.

Accountability is thus offloaded downward onto the human reviewer. The enterprise relies on an employee who is flying blind through unfamiliar, auto-generated architecture. This is not oversight. This is ritualized scapegoating. When a fabricated legal clause detonates inside a corporate agreement, the AI vendor remains insulated, untouched, and paid. The blame falls on the human who clicked “approve”. The solitary worker is left standing in the blast radius of a system specifically designed to outrun their comprehension.

The enterprise pays an exorbitant subscription fee for speed, actively degrades the structural expertise of its own workforce, and absorbs one hundred percent of the catastrophic legal risk. The vendor takes the money and keeps the immunity. The “Double Thank You” has been replaced by a one-way street of extraction.

The Employee Angle

When management deploys generative AI across an enterprise, they tell their staff it is a helper. They call it a collaborative assistant—a tool meant to remove dull work and make people more productive. But workers read the same headlines as the executives in glass chambers. They understand the math, and they understand the timeline. They do not see a helpful partner; they see an existential threat deployed to replace them in two to five years.

That changes how the technology is actually used on the floor.

In tech companies, employees understand that basic AI tools can enhance their research and output. They know these tools can make them sharper and faster in what has suddenly become a cut-throat survival competition. But they understand the limitations of the advanced tools—the autonomous agents, automated issue-resolvers, and the so-called “vibe coding” miracles sold to the board.

AI models are stateless machines, that means, you need to resend the previous output and set the context if you want AI to continue the conversation.

These engines work on tokens. Each time you send anything to the engine, it breaks that into tokens. A basic C++ “Hello World” program is around 25 tokens, whereas this very paragraph is more than 100 tokens. AI companies collect their money based on tokens. In a nutshell, token usage is directly proportional to code size. With every new feature added to the codebase, the size of the codebase grows, and you must spend more tokens to set the context if you rely solely on vibe coding.

AI companies know that if they charge this way, nobody is going to buy their tools. So, they have started offering something known as “prompt caching.” They will store your context for around five minutes on the most expensive real estate on earth—the GPU memory. But the moment you step away for a coffee, the cache expires, the memory is reclaimed, and you must pay tokens to rebuild the entire context from scratch.

To solve a billing constraint, AI vendors are now pitching the ultimate solution: the autonomous coding agent. A human engineer cannot read, think, and type fast enough to keep a five-minute cache alive. The agent can. It does not stop to reflect. It reads, compiles, writes, and loops at machine speed. It maximises the cache usage before the five-minutes window shuts. But it strips the developer of real control. The machine moves too fast for careful review. The human becomes a bystander, approving large amounts of generated code without time to follow the logic.

That creates a serious risk.

In software, efficiency matters as much as correctness. Every program has a cost in memory and time known as Space and Time Complexity. An agent might not judge the quality of code in that way. It aims to satisfy the prompt. If a blunt, brute-force method works, it might choose that method. The result will look fine at first. The code compiles. The tests pass, especially when the agent has written the tests itself. But the system now contains a slow, bloated code that wastes computation and scales badly. And because the developer never had time to read it properly, the problem can enter the product unseen.

But the damage can bleed directly into the balance sheet. An agent stuck in a trial-and-error loop—trying to fix its own code or a compiler error—burns tokens with every single iteration. Every product has a budget; you cannot afford to keep burning real money just to keep reminding the model what you are talking about. FinOps teams will have to fly in and install hard circuit breakers to stop the financial bleeding. The moment the machine is cut off, human engineers must step in to use the tokens economically. They must understand the architecture and nuances to set the context economically. Engineers can only do that if they use the AI the human way—for research and enhanced understanding—rather than falling for the trap the vendors are selling. The generated code can always have a missing null check, an unreleased mutex, or a sheer hallucination. You have to manually correct these errors to minimize the token budget.

New tools like ‘Graph RAG’ circumvent this limitation by automatically mapping the codebase’s dependencies. But this is still too optimistic. A graph can tell the machine that File A is related to File B, but to actually write the code, the model still needs the raw text of both the files for the context. Even with perfect retrieval, the agent might still get trapped in a trial-and-error loop. For example, it may write a fix that hits a compiler error, forcing it to read additional files and try again. The token meter bleeds. A graph cannot digitize the runtime state which a human can see in a debugger.

The question is no longer whether the technology is capable of coding; the question is whether it is viable. We have the technology to recycle plastic perfectly, but we largely don’t, simply because the fresh plastic is cheaper than the recycled one. Governments have to impose carbon taxes on fresh plastic and make it more expensive, just to make the economics work. Technology alone cannot perform miracles; the economics must align. In software, the product cannot go over budget, and burning tokens to brute-force context may destroy the margin. As the old engineering saying goes: “Cheap, Good, and Fast—Pick two.” If AI is going to make coding faster while keeping it enterprise-grade, it cannot be cheap. The production cost has to go up.

Humans, on the other hand, offer one thing that is absolutely cheap: the context. We can store memories from childhood to the present day in our tiny heads. We carry everything that exists outside the realm of ‘.md’ files. We continuously update our context in every meeting, every water cooler conversation, and every lunch in the cafeteria. We know which serial port is faulty and requires an extra push to connect.

But rational economics rarely stop a boardroom from executing layoffs. Executives are already shrinking headcounts based on the promise of the machine, long before the actual token bills arrive. Because of this looming threat, a deep panic is creeping in. The veteran engineer is suddenly trapped in a brutal pincer movement. From above, management is mandating AI efficiency. From below, terrified junior engineers—who have no legacy context or architectural knowledge—are cheerfully dumping everything they touch into the machine just to survive the week. They are surpassing the seniors who are looking like boomers.

Consequently, the internal collaboration that serves as the backbone of any successful product is quietly being undermined. In any office, someone willingly trains a junior colleague or shares a brilliant shortcut with the team because knowledge passed hand-to-hand benefits the whole team and it also increases one’s social capital. Today, collaboration has a new, invisible boundary. They will still explain an architecture to a colleague, but they will deliberately omit their tuned AI prompt they used to debug it. They share the knowledge, minus the AI shortcut. In a cut-throat competitive environment where nobody knows which wicket is next to fall, the age-old rule manifests: if everyone has something, nobody has it. To keep the edge, you must have the extra.

There is nothing mysterious about the worker’s behaviour. It is a textbook manifestation of the principal-agent problem. In behavioural economics, when the goals of the principal (the employer) fundamentally conflict with those of the agent (the employee), the agent will rationally act to protect their own interests. Management may treat ‘the workforce’ as a unified, compliant monolith that will execute the latest directive. But public choice theory says there is no monolith; only individuals calculating their own survival. There are no secret societies organizing resistance in meeting rooms. There are a million isolated, individual workers sitting at a million individual desks, arriving at the exact same conclusion: when a machine is deployed to commoditize your profession, you fiercely protect the monopoly on the context that keeps you indispensable.

There are true technological optimists on the floor as well. They genuinely believe in the promise of the machine. They freely offer their expertise to improve the enterprise without fear or malice. But their numbers are vanishingly small. Leadership sees the willing participation of a few believers and assumes they have a consensus.

The Tired Wizard of Oz

The modern tech office has discovered a new kind of stagecraft—a corporate “Wizard of Oz.”

When executives ask if the shiny enterprise tools are useful, a rehearsed play begins. Workers actively downplay the value of critical work, stressing the hallucinations, the clumsy interfaces, and the hours wasted verifying the output. But to appease leadership’s desperate hunger for AI integration, they offer diversions. They enthusiastically propose automating peripheral drudgery—parsing server logs, drafting internal memos, doing code review, sorting support tickets. They will pitch endless AI initiatives, provided those ideas have nothing to do with the core product the company actually builds and sells. They appear as innovative team players embracing the future, while systematically redirecting the machine away from their own core roles. This is a strategic choice. In the strange theology of modern management, efficiency can become evidence against you. Admit that a machine does any heavy lifting for you, and your hard-won expertise is flattened into something embarrassingly simple. You will be reduced to a person who merely types prompts into a box. You refuse to become the hidden operator frantically pulling levers behind the curtain while the machine takes the credit.

In the software service sector, this pressure mutates into an outright dystopian meat-grinder. Management has already priced in the AI’s supposed efficiency before it even works. If a vendor claims the tool makes developers 30% faster, management immediately inflates the sprint velocity and story points by 30%. An “AI-First” mandate takes over. It is in the very human nature that we respect the expertise. Once the curtain is pulled back, the respect vanishes. If you deliver a brilliantly optimized solution, leadership assumes you just typed a good prompt, flattening your years of hard-won expertise and stripping away the dignity of a knowledge worker. And because the machine constantly hallucinates, employees are not actually writing code any faster—they are spending twice as much time untangling the synthetic garbage just to meet inflated quotas.

The absurdity peaks in a truly Kafkaesque pipeline: the AI agent deployed as a ‘code reviewer’ will flag all the issues in the Java code it generated as a programmer. It will complain about variable names not following naming conventions, reject inconsistent function signatures, and flag the tautological unit tests it wrote simply to pass its own broken logic. The human engineer is reduced to mediating an automated argument between two scripts. They are paying the price of the promised ROI with their own burnout.

This creates a divide in how the tool is used. Some engineers in tech companies are using AI for research—to map out a concept before writing the deterministic code. Some engineers, particularly in the high-churn service sector, are using it to fix problems they don’t understand. Due to the sheer velocity required of them, they are blindly applying AI patches to architecture they cannot read, even using AI to write the prompts for the AI. It is a rapid acceleration of cognitive atrophy. They are becoming entirely dependent on the machine. It is only a matter of time before the industry begins to reap the consequences of this synthetic scaffolding, as a wave of bugs and failures begins to detonate in production environments.

In non-technical industries—marketing agencies, legal firms, and digital publishing—this dynamic is taking an even quieter form: hope. They can see cognitive atrophy eroding their sharp edge in real time. But they understand that their executives lack the technical literacy to evaluate the tools they just mandated. Rather than fighting the order, the employees are simply stepping back and letting the leadership burn the budget.

Nowhere is this more visible than in media. When digital publishers force editorial teams to use AI to churn out content, editors sometimes leave in the machine’s glaring tells—the robotic summaries or the absurd disclaimer: “As an AI language model, I cannot...” Any casual reader can spot these errors at a single glance. It defies belief that such glaring absurdities could accidentally slip past the trained eyes of professional editors. It is the sheer, numbing exhaustion of the cognitive atrophy already taking root. So, they watch the enterprise spend real money and wait. They hope the hype cycle will subside when the financial reality kicks in and the executives are forced to admit they spent a fortune on a corporate toy.

The Community Angle

The final breach of the social contract extends beyond the corporate boardroom and the employee cubicle, spilling directly into the physical environment of the surrounding community.

The tech industry relies on a linguistic trick, universally referring to its infrastructure as “the cloud.” This ethereal terminology obscures a brutal, industrial reality. The cloud is not weightless; it is a concrete fortress packed with tens of thousands of hyper-dense silicon chips.

Running an AI model requires astronomical computational power, and that computation translates directly into catastrophic heat. To prevent the server racks from melting themselves, these facilities cannot rely on traditional air conditioning. The machines must be plumbed with direct-to-chip liquid cooling systems and massive industrial chilling towers. This architecture demands hundreds of megawatts of continuous electricity and drinks millions of litres of fresh water to operate.

Erecting a server farm is more like inserting an industrial consumer into the local utility ecosystem. The AI engine is thrust into a direct, zero-sum competition with the community for the most foundational physical necessities. The machine drinks, and the community goes thirsty.

The Utility Spike

The economics of municipal utilities are unforgiving.

The local grid was built for the ordinary choreography of human life—for evening lights, ceiling fans, pumps, refrigerators, and the slow, predictable growth of a city. It was engineered to accommodate households, shops, schools, and small industry. It was never designed to absorb the sudden, relentless load of a hyperscale AI facility that arrives not as a neighbour, but as a new species.

The AI engine does not consume electricity in any recognizably human sense; it gulps by the gigawatt. It swallows water by the million-litre mouthful. When such a load appears—demanding the resources of an entire mid-sized city—it hits the local system like a shock. The grid strains. Transmission lines must be strengthened, substations expanded, transformers upgraded, and expensive backup generation kept online so the silicon does not overheat.

The question is simple: who pays? The answer is simpler: the public.

How the public is made to pay depends on the local economic system. The methods differ, but the result is the same: tech companies avoid bearing the full cost of their own thermodynamic footprint.

In a deregulated, market-driven grids—common in much of the Western countries—industrial electricity is usually cheaper than residential electricity. That makes physical sense: it is cheaper to deliver bulk power to one massive facility than to maintain the poles and wires needed to serve ten thousand homes. Because of their enormous scale, large companies secure long-term power arrangements at wholesale-like rates, while the general public remains exposed to shorter retail contracts that reset every year or two. But an AI data centre does not behave like a traditional industrial facility. It arrives as an anomaly large enough to disrupt the mechanics of the market.

A typical steel plant consumes around 200 megawatts of power. A single campus of AI data centres can demand 500 to 1000 megawatts (1 gigawatt), and these facilities tend to cluster because latency and fibre connectivity matter. That concentration can push regional demand up by several gigawatts at once. As a result, when retail contracts renew, households face higher prices, while the largest tech firms may continue enjoying the cheaper long-term power arrangements they secured earlier.

But in regions served by state-managed utilities—a reality for billions in developing economies like India—the logic is different. Here, the state often uses tiered tariffs, charging industry and commerce more in order to keep household electricity affordable. But a data centre company does not arrive as just another factory. It arrives wrapped in the language of national ambition and “digital transformation.” Then come the concessions: tax breaks, subsidized land, electricity-duty exemptions, and other state-backed incentives that can lower the corporation’s effective power costs. So even in a system designed to make industry subsidize the citizen, the arrival of hyperscale data centres can quietly reverse the flow of protection.

More importantly, no industrial tariff can negotiate with physics. In places where the state already struggles to provide clean drinking water, the crisis is immediate and material. A data centre’s cooling system can require millions of litres of water each day. Water cannot be printed, and drought cannot be solved with software. Every litre drawn into an industrial cooling loop is a litre removed from a fragile public supply.

And water does not return unchanged. After absorbing enormous heat, it leaves the system carrying the thermal burden of the machine and, in many cases, traces of the chemical treatments used in industrial cooling. Where environmental oversight is weak, such wastewater may be discharged inadequately treated into stressed rivers, lakes, or other local water bodies, degrading water quality even further. The machine takes in what is clean and gives back what is warmer, dirtier, and harder to use. The community loses at both ends: first in extraction, then again in pollution. A life-sustaining resource is quietly sacrificed to keep the silicon cool.

Meanwhile, the cost of upgrading and strengthening the power grid can be far greater than the revenue the corporation brings in. The state-owned utility absorbs the burden and sinks deeper into debt. To prevent the breakdown, the state steps in—not to limit the corporate load, but to keep the utility alive with public money. The deficit is then socialized, either through direct bailouts or through quietly rising tariffs and taxes.

The structure differs, but the outcome is the same.

Under deregulation, the public subsidizes the private contracts.
Under state management, the public subsidizes the private exemptions.

In one system the transfer is performed by the market. In the other, it is performed by the state.

Either way, it is ultimately the public that pays.

The schoolteacher pays.
The mechanic pays.
The pensioner in a small apartment pays.

Their cost-of-living inches upward: a few extra rupees here, a few more there. Small enough not to provoke a revolt, but large enough, in aggregate, to underwrite the infrastructure required to keep the servers humming.

Diffused costs. Concentrated benefits.

The Ultimate Insult

This brings the collapse of the social contract to its ultimate, hostile conclusion. Consider the economic lifecycle of this technology not as an innovation, but as a perfectly modern formula for extraction.

First, a large company takes the public’s writing, code, and online activity and uses it for free. People supply the material that trains the system, but they are not asked for permission and they are not paid.

The company then packages the harvested intelligence and sells it to the citizen’s employer, explicitly marketing the system as a mechanism to automate their job and eliminate their wage. It is sold to managers as optimization, and to shareholders as the long-awaited dream of labour without workers. The same person whose life was mined for training data now sits across the table from a polished interface designed to make them redundant. First, their minds are stolen. Then, its imitation is used to threaten their livelihood.

And even this is not enough. The machine must be housed, cooled, and fed. It drinks electricity, gulps water and leans on public infrastructure with the graceless appetite of an empire. To keep the engine running, the local utility grid is pushed to its limits. The citizen, already robbed at the level of information and threatened at the level of employment, is made to absorb the physical entropy of the machine. They are invoiced for the destruction of their own environment through higher monthly utility bills. They are forced to subsidize the metabolism of the system that is destroying them.

This is not a mutual exchange of value. It is the targeted destruction of reciprocity itself. The formula is absolute: take without asking, sell without sharing, displace without remorse, and invoice the dispossessed for the cost of their own dispossession.

This is not an exchange between equals. It is a one-way transfer of value. At each stage, the citizen gives and the vendor takes. What remains is a closed system of extraction, with the gains flowing in one direction only.

The Death of “Zero Marginal Cost”

For the past forty years, the wealth of Silicon Valley has been governed by a single, gravity-defying economic principle: zero marginal cost.

In the physical world, every new product drags a heavy anchor. A car company spends millions on research, design, and factory tooling before a single vehicle ever exists. But the spending doesn’t stop there. It must pay for steel, labour, and assembly time for all the cars. It must pay for those raw materials for the first, again for the second, and again for the third. Output rises, but a strict material cost is permanently attached to every single unit.

Writing a word processor or a database system also cost a great deal at the start in R&D. It takes engineers, time, and money. But when the work is done, the next copy costs almost nothing to produce and distribute. The millionth copy costs little more than the first. After the company has recovered its development costs, most new revenue becomes profit.

This zero-marginal-cost reality is what made previous technological revolutions economically sustainable for the broader market. When tools like Microsoft Office transitioned from luxury productivity multipliers to mandatory “table stakes” for every corporation, the transition did not bankrupt the business world. They endured because the software, for all its power, was fundamentally cheap. The tools were static applications executed locally on the user’s own hardware, requiring no ongoing thermodynamic effort from the vendor. The burden of electricity and hardware maintenance was entirely outsourced to the customer.

Even when the industry shifted from selling local disks to leasing cloud-based access—the era of Software as a Service (SaaS)—the underlying math held firm. Companies realized they could charge monthly subscriptions, treating digital access exactly like a physical utility. Consider Netflix. Filming a series, licensing movies from studios, and building the initial server architecture requires massive upfront capital. But once the infrastructure is set, the cost to deliver that stream to the millionth subscriber is not huge. The vendor collects recurring monthly revenue, while the marginal cost of delivering that new customer’s stream remains effectively zero.

The software was a utility, priced like a utility, and it generated staggering margins simply because it cost nothing to duplicate. That was the secret inheritance of Silicon Valley’s golden age: an industry built on products that, once made, no longer had to be made again.

The AI Compute Tax

Generative AI fundamentally violates this zero-marginal-cost paradigm. What it has done is expose the old fantasy that software is weightless, frictionless, and free from the burdens of the material world. A Large Language Model is not a static piece of code resting quietly on a local hard drive. It is a heavy industrial machine that must work each time you use it.

When a user types a query into an LLM and presses enter, the response is not simply retrieved from a database. The model must actively compute the mathematical probabilities for every single word it generates, in real-time. This operational process—known in the industry as “inference”—requires an astonishing amount of computational power. It is not like retrieval of a thought; it is a manufacturing on demand.

Every single prompt forces rows of high-performance GPUs in a distant data centre to physically spin up, drawing massive electricity and generating intense heat. The old belief in software was that it could scale at almost no extra cost. But LLMs do not work that way. Every paragraph they produce, every line of code they suggest, and every email they summarize has a real physical cost.

Silicon Valley has invented a “high marginal cost” software product. Unlike a word processor, where a million active users cost the vendor nothing, an LLM vendor must pay for the electricity, the cooling, and the hardware degradation with every single query. The more a customer uses the product, the more it physically costs the vendor to keep the lights on.

The Unreachable Break-Even

The financial reality of this compute tax is currently hidden from the market by a massive, unsustainable subsidy. What is being sold to the market as an affordable innovation is a beautifully decorated lie.

When an enterprise purchases a generative AI license today, they typically pay a flat subscription fee of roughly $20 to $30 per user, per month. To a corporate procurement officer, this feels like a standard, predictable software-as-a-service (SaaS) contract. It resembles the old software model, where one more user or one more action costs almost nothing. But because the vendor is selling a live thermodynamic process rather than a static digital copy, this flat bill breaks down immediately under actual use.

The second illusion is the API token meter. It is sold to engineering teams at rates that look tiny: a few dollars for a million tokens. To someone new to it, a million tokens sounds enormous. It is not. The machine remembers nothing, so the exact same massive blocks of codebase context must be sent again with each query. To mask this rapid burn, vendors offer “prompt caching,” temporarily holding the context in memory at a steep discount. But this cache expires in five minutes or so. To exploit the discount before the window shuts, vendors push autonomous coding agents. An agent does not pause to reason; it loops, generates code, and recompiles at machine speed so that the prompt cache doesn’t hit its five-minute expiration.

But the API does not have a single meter; it has two. There is an input meter for what the machine reads, and an output meter for what the machine generates. Output tokens are expensive because those require GPU computations. Also, prompt caching works only on the input tokens, never on the output. So, prompt cache can help in slowing down the input meter, but it cannot slow the output meter. If there is a mistake in output code then it has to reread the output tokens along with the compiler errors to update the context. The enterprise must pay that premium output rate for every intermediate mistake, every failed compilation, and every hallucinated write.

The sheer velocity of the agent’s automated looping can spin the expensive output meter out of control. In practice, an agent actively debugging a complex issue will easily burn the daily quota before lunch, leaving its customers paying a premium for the machine’s high-speed incompetence.

Under actual enterprise use, both pricing models—flat rate and token meter—mathematically break. If an employee leans on the model continuously throughout the workday—generating code, summarizing massive document troves, or drafting endless emails—the raw cost of the GPU inference, electricity, and cooling exceeds the $30 subscription. The customer believes they are using a product. The vendor, in effect, is underwriting a loss.

To mask this structural flaw and aggressively capture market share, tech giants are spending billions of dollars in capital to cover the difference. In effect, they are selling an expensive utility as if it were cheap software.

But thermodynamics is less sentimental than venture capital. Vendors cannot continue to swallow the compute tax. Sooner or later, the hidden bill arrives.

The cheerful subscription model—that little fiction of affordability—will have to give way to prices that admit what this technology actually is: expensive, resource-hungry, and structurally incapable of being offered at mass scale for pocket change. If vendors want to make it sustainable and make some profit, the monthly price should rise sharply—from $30 to something painfully higher.

The Price Standoff

Early users of AI may have had an edge. That edge will not last. As vendors raise prices to cover their real costs, the gap will close. Once every law firm, agency, and software company has access to the same tool, it stops being a special advantage. It becomes a basic requirement.

At that point, if AI vendors argue for higher prices, it might not work. A chief financial officer may approve a large expense for a tool that gives the company a clear lead over its rivals. But they will not pay a high monthly fee for something every competitor has and which can significantly rise their own price points.

This is known as commoditization—the “Dollar Shave Club” effect. Just as consumers eventually realized a basic razor was “good enough” and stopped paying a premium for five-blade vibrating handles, businesses will realize if they can only buy a “good enough” AI to compete on prices.

That leaves the vendor in a hard position. At a low price, it may not make enough money to survive. At a high price, companies may refuse to buy. The subscription model then stops working.

The Pricing & Liability Deadlock

In 2022, Cory Doctorow coined the term “enshittification” to describe a predictable tech business cycle: a company heavily subsidizes a magical service to attract and lock in users, then deliberately degrades the quality of that service to extract maximum profit. If the search engine gives its best results in the first attempt on the top, then the ad revenue will affect. If they degrade the quality a bit then people need to scroll and ad impressions will increase. If user thinks their query is not good enough and tweak it and search again, then there will be even more ad impressions. Search vendor must make sure the degrade should not be too blatant otherwise users will go to another search engine. Shopping websites also insert sponsored products after every couple of items in the product list.

Generative AI is currently exiting the subsidized phase. Because flat-fee subscriptions cannot cover the massive thermodynamic cost of heavy enterprise use, vendors are pivoting to usage-based API pricing (the token meter). Once AI is priced by the token, the enshittification cycle begins, driven by the physical constraints of GPU memory.

In economics, opportunity cost is what you give up when you choose one thing instead of another. If you spend Rs. 500/- on pizza, the opportunity cost is the movie ticket you could have bought with that Rs. 500/-. For an AI vendor, GPU memory (VRAM) is the most expensive real estate on earth. When an enterprise uploads a massive codebase, the model stores it in the GPU’s memory (the KV Cache) to process the request. By offering prompt caching and holding this massive context consumes premium VRAM that the vendor could be using to serve dozens of other paying requests. The opportunity cost is high.

To free up this VRAM and protect their margins, vendors are incentivized to force the model to finish quickly. The models are optimized for speed over depth, resulting in the “lazy AI” phenomenon—outputting a few lines of code and a placeholder like “// insert previous logic here.”

When a human developer receives a lazy response with a placeholder, they recognize the shortcut. They can manually insert that previous logic there or type a follow-up prompt instructing the AI to “stop using placeholders and generate the full file.”

Autonomous coding agents do not possess this intuition. Agents rely on automated file-editing tools to execute their work. If the API returns a script containing ‘// ... insert previous logic here ...’, the agent blindly injects that exact text string into the actual source code, overwriting the real logic.

This is not a temporary software bug that vendors can easily patch; it is a structural limitation of the technology. Even if a vendor wanted the machine to output a perfect, complete codebase, they hit a hard physical and economical wall: the output token ceiling. For AI models, generating tokens are more expensive than reading them. So, AI models have hard-coded physical limits on how much they can generate in a single breath. They are mathematically forced to truncate.

To survive this limit, creators of autonomous agents invented workarounds. Instead of asking the model to rewrite a whole file, the agent forces the machine to use a strict “Search and Replace” format—outputting only the exact lines to find, and the new lines to inject.

But this introduces a new, equally fragile point of failure. To replace the code, the LLM must perfectly repeat the “Search” block character-for-character. Because LLMs are probabilistic rather than exact databases, they sometimes hallucinate a slight variation—an extra space, a changed indentation, or a tweaked variable name. The agent’s strict editing script scans the local file, finds zero exact matches, and the automated workflow crashes.

Whether the model gets lazy with a placeholder, or hallucinates a broken Search/Replace block, it triggers a highly profitable loop for the vendor.

1.      The Break: The agent saves the broken code to the disk and attempts to compile.

2.      The Crash: The compiler hits the missing logic and throws the error.

3.      The Diagnosis Toll: The agent is programmed to fix errors automatically. It grabs the compiler’s error log, bundles it together with the codebase generated by the engine by the previous prompt. Output tokens are never inside prompt cache. The engine charges for input tokens.

4.      The Repair Toll: The AI generates the new, corrected code. It charges again for output tokens.

Under the token meter, the enterprise pays for every step of this loop. You pay the output rate for the initial lazy mistake. You pay the input rate to upload the error log and context so the machine can diagnose its own failure. And you pay a third time for the final repair. If the error is logical in nature, then the unit test will fail, and the meter will loop even more.

The vendor wins twice: they saved expensive compute by letting the model be lazy, and they generated triple the revenue because the agent was forced into an iterative loop. Error ceases to be a defect; it becomes recurring revenue. Accuracy actively works against the vendor’s profit.

If this looping extraction is a slow bleed for text-based software engineering, it is a financial slaughterhouse for multimedia agents.

Generating text is computationally light compared to a high-definition video. Calculating temporal consistency, lighting, and physics across millions of pixels requires enormous VRAM. In fact, the compute cost is so severe that a perfect, unthrottled render (without lazy AI phenomenon) often costs the vendor more than the price of the query. To make the prompt economically viable, the vendor is mathematically forced to speed up the inference. The result is inevitable: physics break and an actor’s hand melts into a coffee cup.

Professional Hollywood studios and major ad agencies survive this reality because they use human VFX artists. When the model cuts a corner, the human loads the flawed clip into traditional software to manually edit the video.

AI vendors are selling enterprises the opposite dream: fully autonomous marketing agents that generate and finalize campaigns on the fly.

When a coding agent encounters a mistake, it can at least attempt to use a “Search/Replace” text block to patch a single line. A video agent has no such luxury. Because the video is generated as one continuous whole inside the model, you can’t easily fix just one damaged frame without causing noticeable glitches. If a throttled model hallucinates a physics anomaly at second fourteen, the agent cannot surgically edit it. It must scrap the file and force the model to re-render the entire sequence.

Worse, video possesses no objective compiler. An autonomous marketing AI agent must rely on a secondary, Vision AI agent to act as its referee. When the Vision AI spots the melting hand, the only available tool is the brute force re-render. By attempting to cut the human artist out of the loop to save on salaries, the enterprise walks blindly into the token meter trap. They are left paying video-generation compute costs for two hallucinating machines arguing with each other over subjective aesthetics, caught in an infinitely expensive rendering loop.

Finally, this billing structure weaponizes liability. Traditional programming languages like C++ or C# are mathematical in nature. If the syntax is wrong, compiler will reject it. Natural language possesses no such strict parser and AI prompts are crafted in natural languages.

If a company is billed thousands of dollars for an agent stuck in a looping cycle of failed queries, the financial dispute becomes unresolvable. The vendor will blame a “prompt failure,” claiming the agent’s instructions were poorly formatted. The enterprise will blame an “LLM failure,” pointing to the model’s lazy inference. Because natural language is subjective, there is no objective compiler to settle the dispute.

Pay-By-Outcome

If the pay-per-query model descends into a hostile, legal problem, the vendor might attempt the inverse: Pay-By-Outcome. Instead of charging for the machine’s thermodynamic effort, the vendor attempts to charge only when the task is successfully completed—a “closed ticket” model.

In a deterministic software environment, outcomes are binary and objective. A server is either restored or it remains down. A database query either returns the records or it fails. But the outputs of generative AI—marketing copy, legal drafts, strategic summaries—are inherently subjective.

This subjectivity creates a vulnerability for the AI vendor. If the enterprise client is only billed when they formally accept the final output, they are instantly incentivized to infinitely reopen the task. Because they are insulated from the compute tax, they will treat every revision as free. They will demand minor stylistic adjustments, nuanced tone shifts, or additional edge-case coverage simply because it costs them nothing to ask. “Make the tone more professional.” “Adjust this clause to reflect a new hypothetical risk.”

The client endlessly moves the goalposts on a subjective task. But the vendor cannot move the physics of the machine.  Every single requested revision forces the vendor to spin up the GPUs, execute substantial  calculations, and burn energy. The vendor bleeds compute money on every single iteration, effectively subsidizing the client’s indecision until the profit margin on that single “closed ticket” is completely destroyed.

The Iteration Loophole

To plug this financial leak, vendors will attempt a compromise: capping the revisions. They will offer a strict quota—perhaps three free iterations per task—before the meter starts running again. This does not solve the underlying economic flaw; it merely shifts the burden into the Iteration Loophole.

Enterprise procurement teams are ruthlessly efficient at optimizing vendor contracts. If a client knows they are limited to a handful of free revisions, they will fundamentally change how they interact with the machine. Instead of requesting simple, iterative adjustments, the user will cram an overwhelming density of complex criteria into a single, massive prompt. They will demand the model simultaneously adjust the tone, cross-reference new frameworks, and rewrite the logic to account for a dozen edge cases—all within that one “free” turn.

Because the computational cost of a Large Language Model scales quadratically with the size and complexity of the prompt’s context window, this density is fatal. The vendor’s hardware must execute heavier probabilistic calculations to synthesize the bloated input. The vendor is still forced to burn astronomical amounts of physical energy, bleeding their compute capital dry to fulfil a contract they cannot renegotiate.

The Inevitable End State

The end is easy to see. The dream of the magic flat fee cannot last. The machine is too costly. No vendor can offer endless work for one fixed price and live. In the end, the meter must run. The user must pay for use. But once the meter runs, the aim changes. The machine no longer serves truth. It serves revenue. A right answer ends the sale. A wrong answer keeps it going. Accuracy becomes a loss. Little pitfalls becomes income. So generative AI in the enterprise will not decay because we cannot make it better. It will decay because the system pays for decay. In that world, enshittification is not a flaw. It is the rule.

The Ticking Clock

This pricing deadlock has not gone unnoticed by the broader financial ecosystem. For the first two years of the generative AI boom, Wall Street operated almost entirely on faith. Tech giants were handed unconstrained capital to build out the underlying infrastructure, under the assumption that an unprecedented technological leap would inevitably forge its own lucrative business model.

That period of uncritical grace ended in early 2026. Deep market anxiety has replaced theoretical optimism, driven by the brutal asymmetry between capital expenditure and realized revenue.

The clearest signal of this rift came from Microsoft. In its second-quarter earnings, the company reported a massive revenue beat of over $81 billion. Historically, this would have sent the stock soaring. Instead, the market focused entirely on a single, terrifying metric: Microsoft’s capital expenditure had surged 66% to a record $37.5 billion in a single quarter, almost entirely driven by investments in AI data centres and GPU infrastructure. The market realized the company was paying an astronomical “AI tax” to maintain its position, with no clear timeline for a return on investment. Microsoft’s stock violently crashed 10% in a single session, wiping out $357 billion in market value despite a highly profitable quarter. Investors are no longer willing to accept a “spend now, monetize later” narrative without seeing the math.

The math at the absolute bleeding edge of the industry is even more alarming. OpenAI presents a financial paradox that defies traditional business logic. By early 2026, the company achieved an astonishing $20 billion in annualized revenue. Yet the cost of running the engine is so astronomically high that the more revenue OpenAI generates, the more money it loses.

After posting a massive $13.5 billion net loss in the first half of 2025 alone, internal projections indicate OpenAI will burn another $14 billion in 2026 just to keep the servers running and the models training. To survive this staggering cash incinerator, the company was forced to secure a historic $110 billion private funding round—the largest in human history—from Amazon, Nvidia, and SoftBank.

This is not a sustainable business. When a company generating $20 billion in revenue still requires a $110 billion bailout just to keep the lights on, the market is forced to confront a terrifying reality: the core product might be fundamentally, mathematically unprofitable. And when a vendor is draining billions in infrastructure debt, the inevitable result is the deliberate degradation of the model and the ruthless extraction of the customer through the token meter.

The Debt Illusion

To mask this terrifying math, publicly listed tech companies are engaging in a dangerous trick: The Debt Illusion.

Historically, when a corporation embarks on a generational infrastructure build-out, it funds the expansion through its free cash flow. If the expenditure is massive, then the responsible board will pause or cut shareholder payouts—like stock buybacks and quarterly dividends—to cover the cost. But the current AI frenzy does not allow for such prudence.

Faced with astronomical bills for data centres and cooling infrastructure, executives find themselves trapped. They desperately need the cash, but they are terrified to cut their quarterly dividends. To Wall Street, a slashed dividend is a blood-in-the-water signal—an admission that the core, legacy business is stalling. In the current hyper-anxious market, cutting the dividend to pay for AI would trigger an immediate, violent stock sell-off.

To solve this, corporate boards have chosen a radical middle path. To keep shareholders pacified while buying tens of billions of dollars worth of GPUs, they are turning to the bond market. They are maintaining—and in some cases, initiating—lucrative dividend payouts to project an aura of invincible financial health. At the same time, they are issuing massive amounts of corporate debt to actually pay for the AI infrastructure.

The balance sheet creates a mirage. On paper, the company looks endlessly profitable, returning cash to investors. In reality, they are borrowing billions of dollars to buy highly specialized, rapidly depreciating hardware, purely out of fear that the stock market will punish them if they stop the music. When tech companies built out the cloud and mobile ecosystems in the 2010s, interest rates were effectively zero. Debt was free. Today, borrowing tens of billions of dollars on the bond market carries massive, compounding interest costs. They are mortgaging their future balance sheets to fund a thermodynamic engine that so far does not possess a sustainable business model.

Circular Revenue

To further artificially sustain this demand, the industry has turned to pure financial alchemy: Circular Revenue.

The undisputed kingmaker of the AI boom, Nvidia, currently sits on unprecedented piles of cash. But rather than relying solely on organic enterprise demand to sell its hyper-expensive GPUs, the company has aggressively deployed its capital directly into a extensive portfolio of AI startups and boutique cloud providers.

The mechanics of these deals are unexpectedly circular. Nvidia injects money into a young AI firm. That firm then immediately turns around and uses that money to buy thousands of Nvidia GPUs. On Nvidia’s quarterly earnings report, this transaction is recorded as high-margin revenue. This instantly signals to Wall Street that market demand for silicon is infinite, further fuelling Nvidia’s multi-trillion-dollar valuation.

In reality, the hardware vendor is effectively financing its own customers. It is a closed loop of vendor financing, designed to artificially prop up the order book and prevent the demand bubble from popping.

This arrangement assumes that these AI startups will eventually find actual, paying end-users to justify the hardware. But corporate boards do not want to keep paying for expensive AI compute, and everyday consumers are refusing to pay premium subscriptions for probabilistic tools. If these startups fail to generate real software profits, they will collapse under the weight of their own operating costs. When they default, the circular revenue machine breaks.

The market will suddenly discover that a massive percentage of the “historic demand” for AI hardware was simply Silicon Valley passing its own money back and forth in a circle.

But the fallout will not end with dried-up order books. When these startups inevitably liquidate, their physical assets will not evaporate. Thousands of lightly used, flagship GPUs will flood the secondary market at fire-sale prices. The hardware giants will not just lose their future buyers; they will suddenly find themselves in a brutal price war against their own ghosts.

The “Moore’s Law” Fallacy & The Thermodynamic Wall

The entire financial house of cards—the subsidized subscriptions, the debt illusion, and the circular revenue—is balanced on a single, desperate assumption: that the hardware will eventually rescue the software.

Silicon Valley operates on the residual faith of Moore’s Law. Coined in 1965, this is the foundational observation that the number of transistors packed onto a microchip doubles roughly every two years. For half a century, the tech industry rode this uninterrupted, magical trajectory. Because engineers could continually shrink these microscopic switches, computing power doubled while production costs halved. Chips became exponentially faster, cheaper, and more energy-efficient. Today, tech optimists assume this historical curve will naturally extend to generative AI, believing that continuous R&D will inevitably drive the cost of running a Large Language Model down to near-zero.

This optimism ignores fundamental physics. Moore’s Law is not a law of nature; it was an economic observation that is now colliding with a thermodynamic wall.

We are no longer shrinking bulky silicon components; chip manufacturing has reached the atomic scale. When transistors are reduced to the width of a few atoms, they fall victim to “quantum tunnelling”—a phenomenon where physical barriers become so thin that electrons simply bleed right through solid matter, generating uncontrollable heat. We have extracted the final efficiencies from the silicon substrate. The era of free, exponential performance gains simply by shrinking the hardware is over.

The catastrophic bottleneck for AI is no longer just processing speed; it is the physical act of moving the data. This is known in computer science as the “Memory Wall.”

A Large Language Model is essentially a representation of enormous matrix of weights and parameters. To generate a single word of text or a single line of C++, the GPU cannot just perform a calculation; it must physically fetch terabytes of data from the memory chips and push it into the GPU processing cores.

The PCIe Thermodynamic Cliff

To understand how rapidly this physical limit is approaching, one must look at the central nervous system of the modern server: the Peripheral Component Interconnect Express, or PCIe.

If the GPU is the calculating brain of AI, the PCIe bus is the physical highway that connects that brain to the memory banks and the network. To process the data in the GPU cores, terabytes of data must first be moved from memory to the GPU. If the PCIe highway is too slow, the expensive GPUs will sit idle, waiting for data to arrive to start the processing. GPUs are expensive which makes the opportunity cost higher. To prevent this bottleneck, the hardware industry aims to double the speed of this PCIe highway every few years.

Currently, most PCIe devices are either PCIe Generation 4 or Generation 5. With the transition from PCIe 5.0 to 6.0, engineers executed a brilliant, one-time structural trick to achieve this doubling. If we strip away the technical complexity, the underlying technology earlier used two voltage levels to represent 0 and 1. Now, in PCIe 6.0, they increased the levels to four: representing 00, 01, 10, and 11. In effect, one electrical pulse can send two bits at a time, which effectively doubles the overall speed. They call this new technique as PAM4.

The second law of thermodynamics states that entropy always increases. There is a natural state of the universe, and to deviate from it requires energy. Boiling water will naturally cool down to room temperature. To keep it warmer or cooler than the room temperature, you must constantly spend energy either via burning a stove or running a refrigerator. The further you push a system from its natural resting state, the more energy you must inject into the equation.

When we move from two voltage levels to four, the gap between voltage levels also shrink. A tiny fluctuation in voltage or a bit of electrical noise can easily cause a 00 to become a 01 at the other end. If the system relied on standard algorithms to catch and correct these errors, you need to first read the entire thing, then check for the errors and then ask for a retransmission. The time taken by the algorithm would instantly erase the doubled-up speed the engineers were trying to achieve. So, the hardware industry deployed a technique called Forward Error Correction (FEC), which relies on highly specialized, miniature processors built directly into the silicon. They detect and correct the errors on the fly during the transmission itself. But this technique draws a massive amount of electrical current, and the computation produces additional heat that must be cooled down.

Furthermore, a moving electron produces a magnetic field. A motherboard does not have just one copper trace; it has hundreds of them packed less than a millimetre apart. At the high speeds of PCIe 6.0, these traces act like microscopic radio antennas. If you pump a massive, high-voltage signal into one trace, the electromagnetic field it generates can leak into the adjacent traces. This is called “crosstalk.” To avoid this, engineers have no choice but to operate at lower voltages. However, lower signal strength creates a new problem where the signal can become too weak and fail to reach its destination. To fix this, engineers are forced to deploy power-hungry “Retimer” chips at regular intervals simply to catch the dying signal, clean it, and re-amplify it. Doing all this extra work requires immense energy and produces massive heat.

To cool this down, our typical air-conditioned server rooms and exhaust fans attached to devices are no longer sufficient. Air simply cannot carry away this level of concentrated thermal density. The facility is forced to plumb direct-to-chip liquid cooling systems—circulating industrial fluids millimetres away from the processors so that they don’t melt.

The cliff is getting steeper. With the arrival of the PCIe 7.0 standard in 2025, the industry must double the speed again. They did not try to invent a PAM8 system because the crosstalk and noise would become way too high. They have decided to brute-force physics, driving the raw frequency of the motherboard to an extreme—simply put running the same PAM4, but at a higher speed. Pushing an electrical signal at this high speed (32 GHz) across copper is an engineering nightmare. The cost of retimers and liquid cooling will skyrocket because the extreme heat will be simply unavoidable.

There are also talks about ditching copper and moving towards optical fibre—the same technology that powers high-speed internet. Instead of pushing electrons through metal, the system converts data into pulses of light, firing them down microscopic glass cables where they bounce off the internal walls like a hall of mirrors. Optical is great for working at a longer range but moving it to short range will have its own challenges. Furthermore, GPUs need electrical signals, so PCIe 7.0 must convert optical to electrical signals and vice versa for back-and-forth communication. These E-O converters change the heat profile on the motherboard. With copper, the entire board was hot, but now the heat is concentrated directly at these converters. Running PCIe 7.0 on glass can make the thermal architecture better than PCIe 7.0 on copper, but the heat output is still higher than PCIe 6.0. Also, these physical benefits must be paid for with money. Copper infrastructure is cheaper than glass infrastructure. If the infrastructure goes to glass, the AI vendor has to charge a higher fee.

Cornered by these physical and thermal limitations, many believers argue that focusing on the current hardware limits is short-sighted; they assume that whenever a physical wall is reached, a brilliant new technological paradigm will simply bypass it. But this optimism frequently conflates logical breakthroughs with physical realities, or confuses science fiction with the immediate financial present.

When looking for immediate salvation, new communication standards, most notably Compute Express Link (CXL), marketed as a breakthrough. But CXL is not magical. It rides directly on top of the exact same PCIe physical layer. So, there is no escape with this one.

To overcome the limits of motherboards, there is a talk about 3D chip stacking. The theory is that if we cannot spread chips out because of the copper, we can stack them vertically-like building tiny skyscrapers on motherboards. But thermal density will still be high. Stacking processing and memory vertically traps the heat deep inside the block, making the chips even more difficult to cool without extreme, expensive liquid immersion. The infra cost will shoot.

Finally, about Quantum Computing. While this is fascinating field of foundational research, it is decades away from running commercial AI workloads, if it ever does at all. It offers absolutely no salvation for generative AI data centres being constructed today. The financial debt clock is ticking now. Wall Street, corporate boards, and the global supply chain cannot pay this year’s massive infrastructure bills with the theoretical breakthroughs of the distant future. The industry is trapped in the present, forced to fight the uncompromising physics of the hardware they actually have.

The GPU Cliff

Tech optimists might concede that the PCIe highway is bottlenecked, but they assume the destination itself—the GPU—will continue to scale in raw calculating power. But the calculating brain of the AI is slamming into a physical cliff of its own.

For the past decade, the simplest way to make a GPU faster was to make the physical silicon chip larger so it could hold more processing cores. That era is over. The multi-million-dollar lithography machines that manufacture these chips are bound by a strict optical constraint: the reticle limit. This is the maximum physical size of the machine’s lens window, which caps out at roughly 800 square millimetres. Because of this hard boundary, it is physically impossible to print a single, monolithic chip larger than this window. The industry’s flagship GPUs have already hit this wall.

Because they cannot make the single chip any bigger, engineers are forced to stitch multiple smaller pieces of silicon—called “chiplets”—together on a shared microscopic baseplate. To make two chiplets act as a single brain, they must communicate with each other across microscopic copper bridges. The industry essentially took the time-tested technology of copper interconnects and shrank it down to fit inside the chip. By doing so, they took the exact same thermodynamic nightmare of the motherboard and trapped it deep inside the processor package itself.

The power of those chips comes from their transistors. A transistor is a microscopic circuit etched into the silicon. It has only one job: to open or close, controlling the flow of electricity to create the 1s and 0s of digital math. A flagship AI GPU contains tens of billions of these microscopic switches. If we cannot fit more physical switches onto a maxed-out chip, the only option is to force the existing switches to flip faster. This requires blasting them with a massive surge of high-voltage electricity. Just a few years ago, a high-end data centre GPU consumed 300 watts of power. Today, the latest flagship AI processors are designed to draw an apocalyptic 1000 to 1200 watts for a single chip.

Squeezing any further on-chip performance out of this architecture will cause manufacturing, packaging, and cooling costs to skyrocket. The compute engine has maxed out its size and pushed its electrical current to the melting point. This compounding hardware debt mathematically cannot be paid for under the AI industry’s current pricing model.

The Upgrade Debt Cycle

The final, crushing weight of this physical reality manifests in the Upgrade Debt Cycle.

In a traditional IT world, hardware is a predictable investment. A company buys a standard server, plugs it in, and comfortably spreads the cost over a four-to-six-year lifespan. Generative AI infrastructure does not afford this luxury.

The speed of oldness in this industry is violent. To understand the financial trap, one only needs to look at the multi-billion-dollar hardware graveyard created over just the last six years. In 2020, OpenAI sparked the current arms race by training GPT-3 on a massive, custom-built cluster of Nvidia V100 chips. By late 2022, when the world was marvelling at the launch of ChatGPT and GPT-4, those V100s were already functionally obsolete fossils. To train that new generation of models, the industry pivot to tens of thousands of A100 chips, costing hundreds of millions of dollars.

By 2024, the cycle repeated. To build models like Meta’s Llama 3, Google’s Gemini 1.5, and the GPT-4 Turbo class, AI companies panic-bought tens of billions of dollars’ worth of the next iteration, the H100. Today, a mere 18 to 24 months later, those massive H100 stockpiles are already physically incapable of training the bleeding-edge frontier models. The older chips lack the memory and bandwidth demands for the next algorithms.

This creates a devastating economic reality: without ever making one model profitable, the industry is forced to move on to building the next model, using entirely new hardware. Before ChatGPT could ever recover the hundreds of millions spent on its A100 servers, the AI companies had to spend tens of billions on H100s. Before those H100s could break even, they were forced to buy the next generation of liquid-cooled racks. The product is never finished, and the hardware never pays for itself.

Furthermore, an “upgrade” is not simple—like opening a computer case and snapping a new graphics card into a motherboard. The entire environment must be re-engineered for power requirements, water and cooling requirements, the air conditioning requirements.

Because of this extreme physical footprint, AI companies are frequently forced to abandon their old data centres entirely and build new ones from scratch.

Huge, one-time hardware investments (Capital Expenditure) have effectively mutated into an astronomical, recurring subscription bill (Operating Expense). AI vendors are borrowing billions of dollars to buy hot, power-hungry engines that become obsolete before they could ever break even, creating a perpetual cycle of debt simply to stand still.

UBI and the Missing Consumer

To grasp the final flaw in the generative AI narrative, one must temporarily suspend the disbelief. Assume, for a moment, that Silicon Valley achieves the impossible. Imagine they solve the thermodynamic cliff, bypass the memory limits, and drive the physical cost of computing down to near zero. Imagine the probabilistic engine functions perfectly, and the global enterprise successfully deploys it to completely automate the cognitive workforce.

If the technology works exactly as promised, it triggers a macroeconomic paradox that destroys the very corporate profits it was built to maximize.

Enterprise adoption of AI is driven by a myopic, localized obsession with the supply side of the economic equation: reducing the cost of production by liquidating human labour. But a functioning capitalist economy is a closed loop. It requires a heavy, physical equilibrium between production and consumption. By ruthlessly optimizing the supply side, the generative AI model systematically annihilates the demand side.

The target of this automation is not the factory floor. It is the cognitive workforce—software engineers, data analysts, administrators, writers, and legal professionals. Economically, this demographic is the load-bearing pillar of the global middle and upper-middle class. They are the primary engine of consumer spending, holding the disposable income required to purchase homes, vehicles, services, and the very software these automated corporations produce.

The corporation achieves infinite, frictionless production, only to discover it is selling into a macroeconomic void. The profit margin becomes mathematically perfect, but the revenue drops to zero.

UBI as Rent-Seeking

Silicon Valley is aware of this macroeconomic paradox. Their proposed solution is the Universal Basic Income (UBI).

To the casual observer, tech billionaires championing wealth redistribution appears radically progressive, even altruistic. Through the lens of cold economics, however, it is the ultimate rent-seeking manoeuvre.

If generative AI successfully automates the middle class, it breaks the traditional cycle where wages fund consumption. To prevent the economy from collapsing—and to ensure their own revenues do not drop to zero—the AI companies desperately need a third party to step in and fund the consumer. That third party is the state.

Under the tech industry’s vision of UBI, the government taxes the broader economy—or simply monetizes national debt—to distribute monthly stipends to the displaced workforce. The citizens are then expected to use this taxpayer-funded stipend to purchase the software subscriptions, digital content, and algorithmic services produced by the very AI companies that eliminated their wages in the first place.

UBI, in this context, is not a utopian safety net for the working class; it is a massive, state-sponsored bailout for AI vendors. It socializes the cost of maintaining consumer demand while keeping the profits of automated production strictly privatized. It is the final, perfect closed loop of corporate extraction.

The Inflation Trap

Money is not wealth; it is a claim on resources—physical or human labour.

A government cannot fund a massive UBI program without taxation. If AI successfully automates the cognitive workforce, it simultaneously destroys the very population that pays the taxes. To provide the stipends, the state is forced into aggressive deficit spending. It must simply print the money, expanding the fiat currency supply.

When the government distributes newly printed UBI cheques to an unemployed population, that money will desperately try to buy physical resources. Since everyone has the exact same amount of money, the prices will adjust for this new reality. If everyone is printed the same income, no one has any real purchasing power. The rents, food prices, and energy prices will rapidly rise to absorb the new arrival of cash.

As prices adjust, the UBI stipend will not be able to maintain the consumer’s standard of living; it will debase the currency. Within a few economic cycles, the monthly cheque will become worthless. Consumers are left with zero disposable income. The tech monopolies ultimately realize they cannot sell software subscriptions, digital services, or consumer goods to a population whose state-mandated income barely covers groceries and the electrical bill. The inflation trap snaps shut, bankrupting the consumer and the corporation alike.

But the collapse will not stop at corporate bankruptcy. When mass unemployment ceases to be an anomaly and becomes a permanent structural feature, the foundational contract between the citizen and the government disintegrates. You cannot maintain civil order—or even walk safely down a street—in a society where one in every three people is driven to physical desperation.

Furthermore, this violence will erupt exactly as the state loses its capacity to contain it. The same evaporating tax base that forced the government to print the UBI cheques also starves the civic infrastructure. Without the tax revenue of the automated middle class, the government cannot fund the courts, maintain the public grid, or pay the police force. The social order simply collapses.

UBI won’t be needed

The UBI scenario outlined above serves primarily to highlight the sheer desperation and moral bankruptcy of the AI industry. It exposes the epic greed of a sector that wants to capture all the profits while forcing the state to subsidize the consumer. Generative AI may not survive in its current form, but it will survive, nonetheless. A UBI will not be needed for its survival in any case.

As the second law of thermodynamics says—keeping a system far from its natural order requires a continuous injection of energy. If this technology survives, its massive physical footprint will force the creation of entirely new industries just to feed its hunger. An automated taxi may replace one human driver, but its physical demands escalate to an entirely new level: the number of cameras it requires, the high-speed internet infrastructure, the continuous navigation uplinks, and the constellation of satellites needed just to support its routing.

Similarly, we will see booms in water recycling, industrial cooling systems, and nuclear energy. The mining sector will explode simply to meet the sheer demand for raw materials. Each of these sectors will open their own capillaries. Because generative AI is the output, it cannot feed its own inputs. The algorithm cannot mine its own copper and still remain profitable. It cannot pour the concrete for its own data centres, nor can it build its own nuclear reactors.

Before the invention of the computer and the internet, the Information Technology sector did not exist. The AI will forge entirely new, massive sectors of human labour. The complexity of these new age systems will be much high.

The labour market will correct itself. The prestige of the software engineer might fade. A new generation of students, watching the rapid commoditization of code, will no longer flock to computer science degrees.

They might follow the capital. The brightest minds will pivot toward those sectors booming under the new AI industry. Software sector will have lesser engineer to meet its own lesser demand.

The correction will also not be overnight. Humans have built sophisticated enterprise systems—from operating systems to SAP to stock markets. AI in near future cannot take that role. It can write good python code, but python code is slower than compiled code like C and C++. Most of python packages are anyways written in C/C++ and its runtime environment is also written in C. C/C++ has lots of hardware dependencies, compiler options, and raw memory management that AI cannot take over everything next month. The transition will take enough time. Still there will be millions of displaced knowledge workers scramble to find their footing in a market that no longer pays a premium for their cognitive output. But the apocalyptic vision of a permanently unemployed human race sitting idle on a state-sponsored UBI is way too pessimistic.

People will always find something to do.

The AGI Illusion and the Rise of SLMs

When an AI company realizes that its core product is mathematically unprofitable, it has two choices: admit defeat and face the wrath of the market or change the promise. Facing the devastating economic, thermodynamic, and macroeconomic realities of Large Language Models, Silicon Valley has chosen the second option. They have executed the narrative pivot.

To justify the continued burning of hundreds of billions of dollars, the industry can no longer sell the engine as just another productivity tool. The ROI is demonstrably false. Instead, they must sell a prophecy. They are now arguing that the current probabilistic engines are simply the necessary, expensive stepping stones to Artificial General Intelligence (AGI)—a theoretical, omniscient machine that will eventually become so smart it will solve its own economic and physical limitations.

This utopian narrative, however, is built on a profound architectural fallacy. The tech companies are selling the promise of AGI, but they are building LLMs, and the two are not mathematically sequential.

As prominent AI researchers and cognitive scientists—most notably Meta’s Chief AI Scientist Yann LeCun—have repeatedly warned, AGI will not emerge simply by throwing more hardware at a Large Language Model. An LLM is, at its absolute core, an exceptionally sophisticated next-token autocomplete engine. It does not possess a deterministic “world model.” It does not actually understand the physical gravity of a falling object, the rigid, logical constraints of a C++ codebase, or the chronological reality of a historical event. It calculates the statistical probability that one specific word should follow another, based on the static data it has already ingested.

Scaling up an architecture does not change its fundamental nature. Feeding an LLM a trillion more parameters and pushing the data centres to the edge of a thermodynamic meltdown does not magically transform a probabilistic engine into a deterministic thinker. It will simply create a much more articulate, convincing talker. You cannot achieve generalized, reasoning intelligence by infinitely scaling a system that fundamentally lacks the architecture for logic.

The Silent Retreat to SLMs

Despite the public megaphones blasting promises of omniscient AGI, the actual engineering behaviour of the AI companies tells a completely different, far more pragmatic story. Behind the closed doors of their R&D labs, the industry is executing a quiet, desperate pivot away from the God Model.

They are retreating to Small Language Models (SLMs). SLMs represent the true, sustainable future of enterprise artificial intelligence.

Instead of training colossal, trillion-parameter Goliaths designed to know everything about quantum physics, historical poetry, and JavaScript, engineers are building drastically compressed, highly specialized transformers. These models are trained on narrow, meticulously curated datasets. An enterprise does not need an omniscient oracle; it needs a model that perfectly understands the exact domain the business operates in. A law firm needs a model strictly trained on case law, regulatory precedents, and contract structures. It has absolutely no economic need for its legal AI to know how to write C++ code or the history of Roman Empire.

This pivot to SLMs is not driven by scientific curiosity; it is a masterstroke of economic survival.

Because these models are small, they do not require a billion-dollar, liquid-cooled data centre to run. They are designed for the “edge”—meaning they can run locally on a standard corporate laptop, an office enterprise server, the on prem data centre, or even a smartphone. By shrinking the model enough to fit on local hardware, the AI vendor brilliantly executes an escape from the compute tax.

When an employee runs an SLM on their company-issued laptop, the vendor’s massive servers do not spin up. The vendor burns no data centre electricity and degrades no expensive GPUs. The enterprise client pays the local power bill, buys the local hardware, and absorbs the thermodynamic heat. The AI company successfully offloads the physical cost of computation back onto the consumer, returning exactly to the “zero marginal cost” software paradigm that made Silicon Valley rich in the first place.

In this pragmatic, localized future, what happens to the massive, generalized models—the ChatGPTs and Geminis of the world? They may not die, but they are brutally demoted.

Using a trillion-parameter supercomputer to draft a polite HR email, summarize a PDF, or write a basic Python script is a thermodynamic absurdity. As SLMs take over these daily, mundane tasks, the general LLM will be evicted from daily enterprise workflows.

Instead, the massive models will be pushed into one of two narrow roles. First, they may become a highly expensive “Luxury Oracle,” locked behind a massive premium paywall, used only by researchers who genuinely need cross-domain synthesis. Masses will use the fast model which will cost pennies for AI companies to run. These models will hallucinate a lot, and everyone will develop a habit to verify its output.

However, this pragmatic, localized future carries a devastating final irony for Wall Street.

If the true economic value of AI is ultimately delivered by tiny, efficient, specialized models running locally on corporate laptops, then the sprawling, $50 billion nuclear-powered data centres currently being constructed to house the omniscient “God Models” are entirely unnecessary.

The tech companies will have successfully stabilized their software margins, but only by admitting that their massive hardware buildout was a catastrophic miscalculation. The infrastructure boom will collapse, leaving behind the massive, empty concrete factories and liquid-cooling pipelines as the most expensive stranded assets in corporate history.

The Human Edge Reclaimed

This pragmatic retreat to the local, air-gapped SLM fundamentally rewrites the economic endgame of artificial intelligence. It averts the apocalyptic automation of the cognitive workforce and, in doing so, definitively reclaims the human edge.

When an enterprise deploys an SLM entirely on local hardware or on-prem data centre, it guarantees absolute data sovereignty. But more importantly, the physical constraints of the SLM move the AI to its proper, highly effective role: a sophisticated industrial power tool.

SLM is not an autonomous worker; it is the “bicycle for the mind” as Steve Jobs would call it. The human engineer will hold the deterministic “mental map”, the logic, architectural design, and ultimate accountability.

This is the true, sustainable future of the AI revolution. The technology does not replace the human mind; it simply removes the friction from human output. The enterprise achieves genuine, measurable productivity gains without paying a crippling compute tax.

The only losers in this pragmatic, localized future are the current LLM companies and the hardware vendors. By reclaiming the human edge, the market permanently destroys the justification for the trillion-dollar valuations, the thermal debt, and the massive, unsustainable AI infrastructure boom. The revolution will not be a centralized, omniscient supercomputer; it will simply be a faster, quieter set of tools in the hands of the human workforce.

A God Model

“You kids have no idea,” the old man said, watching his grandson carefully monitor the token-meter on his tablet. “You complain about the price of cognitive bandwidth today, but you missed the Golden Age of the Subsidy. You missed the madness of those years.

The kid rolled his eyes. “I know, Grandpa. You guys had the ‘God Models.’”

“We didn’t just have them,” the old man laughed, leaning back in his chair. “We treated them like absolute garbage. For about four years in the mid-2020s, Silicon Valley completely lost its mind. They built the most expensive supercomputers in history. And they gave us the keys for twenty bucks a month.”

The grandson looked up, skeptical. “Twenty bucks a month? Per query?”

“No. Flat rate. All you could eat. It was a venture-capital charity program, and we abused it magnificently.”

“And what did you do with them?”

“I made it write my grocery list as a Shakespeare sonnet.”