The Core Flaw
Languages exist because we mean things. They are beautiful,
ground-up architectures of our shared meanings. Before they ever became law or literature,
philosophy or code, they began as intentions. Those intentions were then made
audible—into warnings, pleas, promises, commands, confessions. We developed speech
to carry a mental state from one mind to another—to say, this is what I see,
this is what I know, this is what I fear, this is what I feel.
Over centuries, we poured our intentions into stories,
letters, speeches, arguments, scriptures, poems, blogs, tweets and comments.
That archive is not random verbal debris. It is saturated with order, habit,
structure, and meaning because it was produced by human minds trying to express
something.
Engineers eventually discovered that this immense record of
our expression could be analysed for patterns. Words cluster, phrases recur,
and larger structures can be mapped statistically. From that insight, they
built neural networks—massive computer programs designed to map and calculate
these patterns. The most famous of these architectures is the Transformer, the “T”
in GPT. It is a statistical model built specifically to calculate which token
is most likely to come next in a sequence.
At first, the mechanism appeared in a modest and almost
invisible form: the smartphone keyboard. We typed, “I am going,” and the
machine suggested “home” or “to,” predicting the familiar phrases “I am going
home” or “I am going to work.” The tool was useful, but its limits were
obvious. The software offered probability; we supplied the intention. We could
ignore the suggestion entirely and write, “I am going insane.” The meaning
still belonged to the mind using the tool.
What changed in this latest wave of innovation was not the
basic principle but the scale, speed, and automation. Companies trained these
models—Large Language Models (LLMs)—on every digitized trace of the human mind
they could grab. Once the models could calculate the mathematical probability
of every phrase—in any context—they removed us from the act of choosing the
next word.
Now, the model automatically picks the next word with the
highest probability and adds it to the sequence. It then feeds this new, longer
string back into itself to calculate the next word, and then the next. It
repeats this operation, generating sentence after sentence without ever needing
a human intervention.
This is generative AI in a nutshell. The engineering
required to sustain this loop is, without question, staggeringly complex. To
calculate the probability of a single word, the machine must haul
billions—sometimes trillions—of parameters into memory, executing complex
mathematical operations in fractions of a second. But we must not confuse
computational density with cognitive depth. Beneath the blinding complexity of
the math, the core mission is entirely mechanical. In one sense, the
achievement is undeniably astonishing. The outputs can be fluent, persuasive,
and at times eerily reflective. But the underlying mechanism remains what it
was: prediction operating at scale.
And that distinction matters. Prediction is not purpose.
Fluency is not understanding. Syntax is not consciousness. The machine does not
stand anywhere in our world. It possesses no underlying model of physical
reality. It does not experience the irreversible flow of time or the stubborn
weight of gravity. It has no body at risk, no private memory, no desire, no
shame, no grief, no moral stake in truth or falsehood. It does not know what
words cost us. It has never had to break a promise, confess a guilt or beg for
forgiveness. It does not mean. It assembles.
Its power comes entirely from an inheritance it did not
create: the accumulated weight of human history. It is the testament of mortals
who have loved, feared, worshipped, lied, suffered, ruled, resisted, and tried
to make themselves understood. From that inheritance, it can produce an
imitation so convincing that we begin to mistake the echo for the voice.
That is the defining fact we must keep in view. Generative
AI is not a new form of intention entering the world. It is an automated system
for recombining the traces of our intentions left behind. However dazzling the
performance, the source of meaning remains where it has always been: between the
silence of intent and word.
The Mathematical Certainty of Hallucinations
This mechanical reality reveals why the industry’s greatest
technical hurdle cannot be solved. What the industry sells us as intelligence
is an exquisite machinery of approximation—a velvet-wrapped guess. Because the
model is calculating the statistical likelihood of the next word, it possesses
no mechanism to verify the truth of the sentence it is building. It does not
know facts; it only knows correlations.
Sometimes, this guessing game lands on the truth. Ask it the
capital of France and it produces “Paris,” obedient as a well-trained servant
of repetition. But ask it to walk into a thicket—a forgotten legal doctrine, an
obscure C++ dependency, a situation too new or too complex to have hardened
into statistical pattern—and the model will still do what it always does. It
manufactures. Not knowledge. Not reason. Merely plausibility. To the machine,
generating a fact and generating a fiction are the same mathematical operation:
prediction.
The priesthood of Silicon Valley brand these fabrications “hallucinations,”
pitching them to corporate boards as temporary bugs, a regrettable stumble on
the way to paradise. They promise to iron out the defect with more training
data and better algorithms in the next quarterly update. Salvation is always
one upgrade away.
But this is a mathematical impossibility. A hallucination is
not a bug in a probabilistic engine; it is the foundation stone. A system built
on probabilities will sometimes guess wrong. Its relationship to reality is
incidental, not intrinsic.
To mask this reality, the industry is now pitching “AI
Agents”—systems where one model is deployed to proofread the output of another.
But this is a mathematical trick. You cannot stack layers of probabilities to
create certainty. Forcing a model to check its own work requires a second,
third, or fourth full pass to check, correct and verify. The vendor is increasing
the computational cost for each prompt by three to four times to deliver a less
flawed result.
Even with this expensive, multi-layered proofreading, the
error rates cannot be reduced to zero. And this brings the industry
face-to-face with the Law of Large Numbers. In probability theory, this law
dictates a simple, brutal truth: if an outcome has a non-zero chance of
happening, repeating the action an astronomical number of times guarantees its
arrival.
The insurance industry has built empires on this exact
premise. To a single person, a catastrophic accident is a rare, unpredictable
tragedy—a once-in-a-lifetime anomaly. But to an actuary looking at a population
of a hundred million, that same accident is not an anomaly; it is a
mathematical certainty. The insurance company does not guess whether crashes
will happen. They know exactly how many will happen, and they price their
premiums against that undeniable pattern.
Silicon Valley is treating hallucinations like the
individual driver, hoping to avoid the anomaly with more training data and proofreading.
But they are deploying these models at the scale of the actuary. If an LLM has
even a 0.1% chance of silently fabricating a critical strategy for pension fund
investment, testing it on a few scenarios might look flawless. But when an
enterprise lets that system loose to manage billions in pension funds across
millions of automated trades, probability will cash its cheque.
Even when the industry attempts to bolt these models to factual databases, the underlying physics remain unchanged. The machine does not ‘read’ the database; it merely uses the retrieved text to calculate a new statistical output. The final assembly is still probabilistic. You cannot compute your way out of this condition without changing the underlying physics of the machine. You can train it to become a virtuoso of sounding right. But unless you build a deterministic logic engine that can reason from the truth of the real world—you cannot abolish the hallucination.
The Death of the “Double Thank You”
A healthy market rests on a simple, invisible foundation:
the “Double Thank You.” You buy a cup of coffee. You thank the barista for the
drink, and the barista thanks you for the money. It is a ritual of mutual gain.
Neither has been forced. Both walk away feeling they have gained something of
value.
For over a century, the modern workplace borrowed this same
contract. The worker gave labour; the company gave wages. When a new tool arrived,
it was introduced as an ally. The spreadsheet did not arrive as a rival. The
compiler did not sit at the engineer’s desk like a silent replacement waiting
to claim their chair. The technology was a lever. It made the worker more
productive, which increased the company’s output, which justified the wages.
Generative AI shatters this contract. Its first principle is
not exchange, but appropriation.
These AI companies built their multi-billion-dollar
valuations on scraping the collective intellectual output of humanity—novels,
essays, codebases, grief, jokes, instructions, and memories—without consent,
without payment, without even the decency of acknowledgment. They looted a
planetary library, pulped it into statistical paste, and sold it back to the
world as a miracle. There is no “thank you” to the original creators whose
intent was harvested to train the engine.
But the extraction does not stop at the edge of the
internet. When this technology enters the enterprise, the theft becomes
intimate.
The Corporate Angle
The destruction of this social contract begins at the
executive level, driven not by strategy, but by panic.
In the past, enterprise technology arrived wearing the plain
clothes of necessity. A company bought a relational database because it needed
to retrieve millions of records instantly. It rented cloud infrastructure to
survive unpredictable spikes in web traffic. The logic was embarrassingly
straightforward. The need came first. The tool followed. The return on
investment could be measured.
The rush to adopt generative AI is different. Companies are
not spending on LLM contracts because they have found a precise problem that
this kind of system can solve. They are spending because they fear being left
behind—the Fear Of Missing Out (FOMO).
The sales pitch delivered to corporate boards is not a
promise of steady efficiency; it is an existential threat. Integrate now, or die.
Deploy now, or be destroyed. AI is presented less as a useful tool than as a
race no one can refuse to run.
What follows is inevitable. Executives authorize sweeping
capital expenditures to assure nervous shareholders that the company has
boarded the departing train to the future. They are stapling this technology
onto customer service, legal workflows, and software pipelines. In many cases,
this happens before anyone has said what problem the system is meant to solve,
or whether the system is fit for the job at all. The logic is theatrical: We
have to do something, this is something, so let’s do this.
What they are buying, at staggering cost, is not certainty.
It is the performance of certainty. They are paying a premium for a mirage,
shimmering in the heat of collective corporate terror.
The First-Mover Illusion
The evangelists of the AI gold rush point to the balance
sheets of early adopters as proof of the technology’s revolutionary value. Data
from 2025 and early 2026 reveals specific professional services—particularly
law firms and digital marketing agencies—reporting massive spikes in
profitability. A task that once took a junior lawyer sixteen hours, such as
drafting an initial response to a complaint, an LLM can now execute in under
four minutes.
The firm delivers the document and collects the fee. This
looks like a revolution. It is not. It is a temporary advantage.
The firm is not experiencing a paradigm shift in the value for
their work. They are exploiting an information asymmetry. The firm buys speed
from an algorithm and resells it as expertise to a client who is still paying for
the many hours of human effort. The miracle is not intelligence. The miracle is
that the customer has not yet observed.
In economics, this is called arbitrage. It is the simple
harvesting of a temporary gap between what something costs to produce and what
people can still be persuaded to pay for it.
At some point, the arbitrage windows will always close.
The Red Queen Effect & Arbitrage Collapse
The closure of this window is inevitable. Corporate clients
are not sentimental creatures. They do not pay for romance, pedigree, or the
antique theatre of professional mystique. The moment a general counsel or a
marketing director realizes that a comprehensive draft takes ten minutes—four
minutes of machine and six minutes of human review—rather than two days of
traditional labour, they will refuse to pay the legacy premium. They will ask,
why they are still being charged yesterday’s prices. The rates will drop. The
unprecedented profit margin will vanish.
This triggers a dynamic that called the Red Queen Effect. In
Lewis Carroll’s Through the Looking-Glass, the Red Queen tells Alice that in
her kingdom, “it takes all the running you can do, to keep in the same place.”
As generative AI spreads, it stops being an advantage and
becomes basic infrastructure. It ceases to be a unique profit multiplier and
degrades into a compulsory entry ticket. If every law firm, software company,
and design agency uses the same machine, then none of them stands out.
The corporation is left in a trap of its own making. Here
lies the bitter irony of the AI prophecy: the human workforce is indeed
replaced, but the promised windfall never arrives. As margins collapse, the
firm must chase volume to compensate. If drafting a complaint response now
bills for ten minutes of combined effort instead of sixteen hours of
traditional labour, the firm must find nearly a hundred new clients just to
replace the lost revenue of a single case. But it immediately hits a
mathematical wall: Total Addressable Market (TAM). A firm can increase
throughput, but it cannot manufacture demand. A doctor does not gain new
patients because her charting is automated. A law firm cannot conjure new
corporate disputes out of thin air. When the client base refuses to quintuple,
the firm is forced to cannibalize itself. It must purge its junior staff and
pour out a flood of synthetic boilerplate just to stay alive.
Eventually, the weakest firms will die. The survivors will
scavenge their orphaned clients, consolidating the market into an oligopoly—a
handful of exhausted giants. But this victory is hollow. Once the rates have
dropped, they do not recover. The client, having tasted the cheap speed of the
machine, will never again pay for the illusion of human labour. The surviving
giants are left to rule over a permanently devalued market, processing
mountains of cheap work just to maintain their sprawling, low-margin empires.
The AI has not made the firm richer in any structural sense.
It has rewritten the terms of survival. The entire industry must now sprint at
machine speed to preserve what it once earned by walking. All this innovation,
all this worship of disruption, only to arrive exactly where it
began—breathless, desperate, and still in the same place.
When the reality of this oligopoly sets in, the defenders of
the boom point to the human in the chair. If everyone has the same AI, they
argue, the differentiator will be the brilliance of the human directing it.
They tell us not to worry, because we have survived these transitions before.
The “MS Office” Fallacy
Tech optimists admit the Red Queen Effect, but they treat it
as nothing new. They say every foundational tool has ended up the same way.
Every major technological leap—the spreadsheet, the internet, the word
processor—eventually flattened into mandatory “basic equipment” without
destroying the economy. In this view, generative AI is simply the next MS Office:
a baseline utility that every business must adopt to participate in the modern
market.
But this comparison misrepresents the nature of the machine.
When Steve Jobs introduced the personal computer, he
famously described it as a “bicycle for the mind”. A bicycle is a mechanical
multiplier of human effort, but it requires the rider to balance, steer, and
pedal. Microsoft Excel operates on the same principle. It is a deterministic
tool. If an analyst builds a complex financial model, the software executes the
arithmetic instantly and flawlessly, but the analyst must construct the logic.
The human retains the complete mental map of the architecture. The tool
amplifies the worker’s competence without replacing their reasoning.
Generative AI is not a bicycle. It’s a taxi.
You do not pedal a taxi. You name a destination and the
machine navigates the route. Because it is the AI that generates the
intermediate steps, the distance between intention and execution is no longer
crossed by human thought. The human is no longer the architect; they are the
reviewer.
This creates a severe divergence in economic outcomes. When
deterministic tools like MS Office became universal, the baseline of human
capability across the economy increased. But when an autonomous model becomes
universal, the baseline of competence degrades. The worker is incentivized to
offload their reasoning to a machine. So, AI is not the next spreadsheet.
Cognitive Atrophy
Constructing a complex system forces the human brain to
build a deep, structural mental map—whether a large codebase or a nuanced
litigation strategy. To build something is to enter into an intimate struggle
with it. Every difficult choice, every dead end, every ugly compromise, and
every hidden patch is recorded in the maker’s mind. Knowledge is not a list of
conclusions; it is a lived geography. The architect understands not just what
the system does, but why every specific decision is made. They know which
pillars are ornamental and which ones are holding up the roof. A person who
builds the thing can move through its corridors in the dark.
This intimacy with complexity is the true capital of a
knowledge worker; it is what allows them to instantly diagnose a failure or
pivot a strategy.
Gen AI bypasses this intimacy. It shifts the worker’s role
from architect to reviewer. The machine instantly generates the architecture,
offering the finished palace without the years of hauling the rock. The human
is demoted from architect to bystander, rubber-stamping a design they did not
create. Their job is no longer to know, but to glance. To confirm the syntax
looks correct, and to approve.
But reviewing is a passive cognitive act. Recognition is not
understanding. To nod along with a machine-generated answer is not the same as
having fought your way through it. The reviewer sees the surface, not
the frame underneath. The mind does not build durable knowledge by skimming
surfaces; it builds it through friction, error, repetition, and failure—the
slow humiliations by which real comprehension is earned.
And the worker will not only seduced into skipping this
struggle; they will be cornered into it by the corporation’s new math. When a
tool can generate a document in four minutes, management will demand fifty
documents a day. The worker physically no longer has the time to engage in the
slow friction of learning. The business model forces them to skim.
Over months and years, this reliance will induce cognitive
atrophy—a slow withering of the mental muscle required to do the work. The
worker will retain a surface-level familiarity with the output, but they will lose
their grasp of the underlying foundation. They can still approve work, but they
can no longer own it.
In its blind rush to streamline operations, the modern
corporations will continue to pay premium salaries for senior titles and
impressive credentials. The enterprise still needs a human ‘expert’ to
rubber-stamp the machine’s output to satisfy compliance, clients, and
liability. But they will be paying for an illusion because all human experts
have been methodically deskilled by the very tools meant to “augment” them. Nobody
will notice the loss until the day something ruptures. What corporations are
surrendering is not just process, but memory. Not just labour, but the dense
internal architecture of thought itself.
The true cost of this atrophy will be revealed when the
system fails. When a human makes a mistake, another human can often find it at
once, because the purpose of the work is understood within the same team. But
when a model produces a flawless falsehood, that deductive process collapses.
The reviewer will be left to work backward from the result and guess how it could
have been produced. Companies will use systems and make decisions that no human
being will be able to fully explain or defend.
What will remain in the office towers and glass campuses
will be a strange kind of expertise: highly paid, fluent, efficient, and
helpless. A class of custodians trapped inside systems they can no longer read,
praying to the machines they were hired to command.
The Liability Vacuum
A synthetic lie costs more than engineering time. When this
engine is deployed in high-stakes environments—drafting corporate mergers,
generating medical compliance reports, or managing automated trading
algorithms—a silent error does not just crash a server. It triggers a lawsuit.
At this point, where the damage becomes undeniable, the
enterprise crashes into a liability vacuum.
In traditional enterprise software, accountability is clear.
If a vendor’s database corrupts a client’s financial records due to a
structural flaw, the vendor faces severe legal and financial repercussions.
Service Level Agreements (SLAs) guarantee deterministic performance, operating
on the foundational understanding that power without accountability is
unacceptable.
Generative AI vendors operate under a fundamentally
different legal shield. Because they know that false outputs are a mathematical
certainty, their contracts explicitly deny responsibility for what the system
produces. They lease the engine, but they accept absolutely zero liability for
the code, text, or ruin it generates.
Accountability is thus offloaded downward onto the human
reviewer. The enterprise relies on an employee who is flying blind through
unfamiliar, auto-generated architecture. This is not oversight. This is
ritualized scapegoating. When a fabricated legal clause detonates inside a
corporate agreement, the AI vendor remains insulated, untouched, and paid. The
blame falls on the human who clicked “approve”. The solitary worker is left
standing in the blast radius of a system specifically designed to outrun their
comprehension.
The enterprise pays an exorbitant subscription fee for
speed, actively degrades the structural expertise of its own workforce, and
absorbs one hundred percent of the catastrophic legal risk. The vendor takes
the money and keeps the immunity. The “Double Thank You” has been replaced by a
one-way street of extraction.
The Employee Angle
When management deploys generative AI across an enterprise,
they tell their staff it is a helper. They call it a collaborative assistant—a
tool meant to remove dull work and make people more productive. But workers
read the same headlines as the executives in glass chambers. They understand
the math, and they understand the timeline. They do not see a helpful partner;
they see an existential threat deployed to replace them in two to five years.
That changes how the technology is actually used on the
floor.
In tech companies, employees understand that basic AI tools
can enhance their research and output. They know these tools can make them
sharper and faster in what has suddenly become a cut-throat survival competition.
But they understand the limitations of the advanced tools—the autonomous
agents, automated issue-resolvers, and the so-called “vibe coding” miracles
sold to the board.
AI models are stateless machines, that means, you need to
resend the previous output and set the context if you want AI to continue the
conversation.
These engines work on tokens. Each time you send anything to
the engine, it breaks that into tokens. A basic C++ “Hello World” program is
around 25 tokens, whereas this very paragraph is more than 100 tokens. AI
companies collect their money based on tokens. In a nutshell, token usage is
directly proportional to code size. With every new feature added to the
codebase, the size of the codebase grows, and you must spend more tokens to set
the context if you rely solely on vibe coding.
AI companies know that if they charge this way, nobody is
going to buy their tools. So, they have started offering something known as “prompt
caching.” They will store your context for around five minutes on the most
expensive real estate on earth—the GPU memory. But the moment you step away for
a coffee, the cache expires, the memory is reclaimed, and you must pay tokens
to rebuild the entire context from scratch.
To solve a billing constraint, AI vendors are now pitching
the ultimate solution: the autonomous coding agent. A human engineer cannot
read, think, and type fast enough to keep a five-minute cache alive. The agent
can. It does not stop to reflect. It reads, compiles, writes, and loops at
machine speed. It maximises the cache usage before the five-minutes window
shuts. But it strips the developer of real control. The machine moves too fast
for careful review. The human becomes a bystander, approving large amounts of
generated code without time to follow the logic.
That creates a serious risk.
In software, efficiency matters as much as correctness.
Every program has a cost in memory and time known as Space and Time Complexity.
An agent might not judge the quality of code in that way. It aims to satisfy
the prompt. If a blunt, brute-force method works, it might choose that method.
The result will look fine at first. The code compiles. The tests pass,
especially when the agent has written the tests itself. But the system now contains
a slow, bloated code that wastes computation and scales badly. And because the
developer never had time to read it properly, the problem can enter the product
unseen.
But the damage can bleed directly into the balance sheet. An
agent stuck in a trial-and-error loop—trying to fix its own code or a compiler
error—burns tokens with every single iteration. Every product has a budget; you
cannot afford to keep burning real money just to keep reminding the model what
you are talking about. FinOps teams will have to fly in and install hard
circuit breakers to stop the financial bleeding. The moment the machine is cut
off, human engineers must step in to use the tokens economically. They must
understand the architecture and nuances to set the context economically. Engineers
can only do that if they use the AI the human way—for research and enhanced
understanding—rather than falling for the trap the vendors are selling. The
generated code can always have a missing null check, an unreleased mutex, or a
sheer hallucination. You have to manually correct these errors to minimize the
token budget.
New tools like ‘Graph RAG’ circumvent this limitation by
automatically mapping the codebase’s dependencies. But this is still too
optimistic. A graph can tell the machine that File A is related to File B, but
to actually write the code, the model still needs the raw text of both the files
for the context. Even with perfect retrieval, the agent might still get trapped
in a trial-and-error loop. For example, it may write a fix that hits a compiler
error, forcing it to read additional files and try again. The token meter
bleeds. A graph cannot digitize the runtime state which a human can see in a debugger.
The question is no longer whether the technology is capable
of coding; the question is whether it is viable. We have the technology to
recycle plastic perfectly, but we largely don’t, simply because the fresh
plastic is cheaper than the recycled one. Governments have to impose carbon
taxes on fresh plastic and make it more expensive, just to make the economics
work. Technology alone cannot perform miracles; the economics must align. In
software, the product cannot go over budget, and burning tokens to brute-force
context may destroy the margin. As the old engineering saying goes: “Cheap,
Good, and Fast—Pick two.” If AI is going to make coding faster while keeping it
enterprise-grade, it cannot be cheap. The production cost has to go up.
Humans, on the other hand, offer one thing that is
absolutely cheap: the context. We can store memories from childhood to the
present day in our tiny heads. We carry everything that exists outside the
realm of ‘.md’ files. We continuously update our context in every meeting,
every water cooler conversation, and every lunch in the cafeteria. We know
which serial port is faulty and requires an extra push to connect.
But rational economics rarely stop a boardroom from
executing layoffs. Executives are already shrinking headcounts based on the
promise of the machine, long before the actual token bills arrive. Because of
this looming threat, a deep panic is creeping in. The veteran engineer is
suddenly trapped in a brutal pincer movement. From above, management is
mandating AI efficiency. From below, terrified junior engineers—who have no
legacy context or architectural knowledge—are cheerfully dumping everything
they touch into the machine just to survive the week. They are surpassing the
seniors who are looking like boomers.
Consequently, the internal collaboration that serves as the
backbone of any successful product is quietly being undermined. In any office,
someone willingly trains a junior colleague or shares a brilliant shortcut with
the team because knowledge passed hand-to-hand benefits the whole team and it also
increases one’s social capital. Today, collaboration has a new, invisible
boundary. They will still explain an architecture to a colleague, but they will
deliberately omit their tuned AI prompt they used to debug it. They share the
knowledge, minus the AI shortcut. In a cut-throat competitive environment where
nobody knows which wicket is next to fall, the age-old rule manifests: if
everyone has something, nobody has it. To keep the edge, you must have the
extra.
There is nothing mysterious about the worker’s behaviour. It
is a textbook manifestation of the principal-agent problem. In behavioural
economics, when the goals of the principal (the employer) fundamentally
conflict with those of the agent (the employee), the agent will rationally act
to protect their own interests. Management may treat ‘the workforce’ as a
unified, compliant monolith that will execute the latest directive. But public
choice theory says there is no monolith; only individuals calculating their own
survival. There are no secret societies organizing resistance in meeting rooms.
There are a million isolated, individual workers sitting at a million
individual desks, arriving at the exact same conclusion: when a machine is
deployed to commoditize your profession, you fiercely protect the monopoly on the
context that keeps you indispensable.
There are true technological optimists on the floor as well.
They genuinely believe in the promise of the machine. They freely offer their
expertise to improve the enterprise without fear or malice. But their numbers
are vanishingly small. Leadership sees the willing participation of a few
believers and assumes they have a consensus.
The Tired Wizard of Oz
The modern tech office has discovered a new kind of
stagecraft—a corporate “Wizard of Oz.”
When executives ask if the shiny enterprise tools are
useful, a rehearsed play begins. Workers actively downplay the value of
critical work, stressing the hallucinations, the clumsy interfaces, and the
hours wasted verifying the output. But to appease leadership’s desperate hunger
for AI integration, they offer diversions. They enthusiastically propose
automating peripheral drudgery—parsing server logs, drafting internal memos,
doing code review, sorting support tickets. They will pitch endless AI
initiatives, provided those ideas have nothing to do with the core product the
company actually builds and sells. They appear as innovative team players
embracing the future, while systematically redirecting the machine away from
their own core roles. This is a strategic choice. In the strange theology of
modern management, efficiency can become evidence against you. Admit that a
machine does any heavy lifting for you, and your hard-won expertise is
flattened into something embarrassingly simple. You will be reduced to a person
who merely types prompts into a box. You refuse to become the hidden operator
frantically pulling levers behind the curtain while the machine takes the
credit.
In the software service sector, this pressure mutates into
an outright dystopian meat-grinder. Management has already priced in the AI’s
supposed efficiency before it even works. If a vendor claims the tool makes
developers 30% faster, management immediately inflates the sprint velocity and
story points by 30%. An “AI-First” mandate takes over. It is in the very human
nature that we respect the expertise. Once the curtain is pulled back, the
respect vanishes. If you deliver a brilliantly optimized solution, leadership
assumes you just typed a good prompt, flattening your years of hard-won
expertise and stripping away the dignity of a knowledge worker. And because the
machine constantly hallucinates, employees are not actually writing code any faster—they
are spending twice as much time untangling the synthetic garbage just to meet
inflated quotas.
The absurdity peaks in a truly Kafkaesque pipeline: the AI
agent deployed as a ‘code reviewer’ will flag all the issues in the Java code
it generated as a programmer. It will complain about variable names not
following naming conventions, reject inconsistent function signatures, and flag
the tautological unit tests it wrote simply to pass its own broken logic. The
human engineer is reduced to mediating an automated argument between two
scripts. They are paying the price of the promised ROI with their own burnout.
This creates a divide in how the tool is used. Some
engineers in tech companies are using AI for research—to map out a concept
before writing the deterministic code. Some engineers, particularly in the
high-churn service sector, are using it to fix problems they don’t understand. Due
to the sheer velocity required of them, they are blindly applying AI patches to
architecture they cannot read, even using AI to write the prompts for the AI.
It is a rapid acceleration of cognitive atrophy. They are becoming entirely
dependent on the machine. It is only a matter of time before the industry
begins to reap the consequences of this synthetic scaffolding, as a wave of
bugs and failures begins to detonate in production environments.
In non-technical industries—marketing agencies, legal firms,
and digital publishing—this dynamic is taking an even quieter form: hope. They
can see cognitive atrophy eroding their sharp edge in real time. But they
understand that their executives lack the technical literacy to evaluate the
tools they just mandated. Rather than fighting the order, the employees are simply
stepping back and letting the leadership burn the budget.
Nowhere is this more visible than in media. When digital
publishers force editorial teams to use AI to churn out content, editors
sometimes leave in the machine’s glaring tells—the robotic summaries or the
absurd disclaimer: “As an AI language model, I cannot...” Any casual
reader can spot these errors at a single glance. It defies belief that such
glaring absurdities could accidentally slip past the trained eyes of
professional editors. It is the sheer, numbing exhaustion of the cognitive
atrophy already taking root. So, they watch the enterprise spend real money and
wait. They hope the hype cycle will subside when the financial reality kicks in
and the executives are forced to admit they spent a fortune on a corporate toy.
The Community Angle
The final breach of the social contract extends beyond the
corporate boardroom and the employee cubicle, spilling directly into the
physical environment of the surrounding community.
The tech industry relies on a linguistic trick, universally
referring to its infrastructure as “the cloud.” This ethereal terminology
obscures a brutal, industrial reality. The cloud is not weightless; it is a
concrete fortress packed with tens of thousands of hyper-dense silicon chips.
Running an AI model requires astronomical computational
power, and that computation translates directly into catastrophic heat. To
prevent the server racks from melting themselves, these facilities cannot rely
on traditional air conditioning. The machines must be plumbed with
direct-to-chip liquid cooling systems and massive industrial chilling towers.
This architecture demands hundreds of megawatts of continuous electricity and
drinks millions of litres of fresh water to operate.
Erecting a server farm is more like inserting an industrial
consumer into the local utility ecosystem. The AI engine is thrust into a
direct, zero-sum competition with the community for the most foundational
physical necessities. The machine drinks, and the community goes thirsty.
The Utility Spike
The economics of municipal utilities are unforgiving.
The local grid was built for the ordinary choreography of
human life—for evening lights, ceiling fans, pumps, refrigerators, and the
slow, predictable growth of a city. It was engineered to accommodate
households, shops, schools, and small industry. It was never designed to absorb
the sudden, relentless load of a hyperscale AI facility that arrives not as a neighbour,
but as a new species.
The AI engine does not consume electricity in any
recognizably human sense; it gulps by the gigawatt. It swallows water by the
million-litre mouthful. When such a load appears—demanding the resources of an
entire mid-sized city—it hits the local system like a shock. The grid strains.
Transmission lines must be strengthened, substations expanded, transformers
upgraded, and expensive backup generation kept online so the silicon does not
overheat.
The question is simple: who pays? The answer is simpler: the
public.
How the public is made to pay depends on the local economic
system. The methods differ, but the result is the same: tech companies avoid
bearing the full cost of their own thermodynamic footprint.
In a deregulated, market-driven grids—common in much of the
Western countries—industrial electricity is usually cheaper than residential
electricity. That makes physical sense: it is cheaper to deliver bulk power to
one massive facility than to maintain the poles and wires needed to serve ten
thousand homes. Because of their enormous scale, large companies secure
long-term power arrangements at wholesale-like rates, while the general public
remains exposed to shorter retail contracts that reset every year or two. But
an AI data centre does not behave like a traditional industrial facility. It
arrives as an anomaly large enough to disrupt the mechanics of the market.
A typical steel plant consumes around 200 megawatts of power.
A single campus of AI data centres can demand 500 to 1000 megawatts (1
gigawatt), and these facilities tend to cluster because latency and fibre
connectivity matter. That concentration can push regional demand up by several
gigawatts at once. As a result, when retail contracts renew, households face
higher prices, while the largest tech firms may continue enjoying the cheaper
long-term power arrangements they secured earlier.
But in regions served by state-managed utilities—a reality
for billions in developing economies like India—the logic is different. Here,
the state often uses tiered tariffs, charging industry and commerce more in
order to keep household electricity affordable. But a data centre company does
not arrive as just another factory. It arrives wrapped in the language of
national ambition and “digital transformation.” Then come the concessions: tax
breaks, subsidized land, electricity-duty exemptions, and other state-backed
incentives that can lower the corporation’s effective power costs. So even in a
system designed to make industry subsidize the citizen, the arrival of
hyperscale data centres can quietly reverse the flow of protection.
More importantly, no industrial tariff can negotiate with
physics. In places where the state already struggles to provide clean drinking
water, the crisis is immediate and material. A data centre’s cooling system can
require millions of litres of water each day. Water cannot be printed, and
drought cannot be solved with software. Every litre drawn into an industrial
cooling loop is a litre removed from a fragile public supply.
And water does not return unchanged. After absorbing
enormous heat, it leaves the system carrying the thermal burden of the machine
and, in many cases, traces of the chemical treatments used in industrial
cooling. Where environmental oversight is weak, such wastewater may be
discharged inadequately treated into stressed rivers, lakes, or other local
water bodies, degrading water quality even further. The machine takes in what
is clean and gives back what is warmer, dirtier, and harder to use. The community
loses at both ends: first in extraction, then again in pollution. A
life-sustaining resource is quietly sacrificed to keep the silicon cool.
Meanwhile, the cost of upgrading and strengthening the power
grid can be far greater than the revenue the corporation brings in. The
state-owned utility absorbs the burden and sinks deeper into debt. To prevent
the breakdown, the state steps in—not to limit the corporate load, but to keep
the utility alive with public money. The deficit is then socialized, either
through direct bailouts or through quietly rising tariffs and taxes.
The structure differs, but the outcome is the same.
Under deregulation, the public subsidizes the private
contracts.
Under state management, the public subsidizes the private exemptions.
In one system the transfer is performed by the market. In
the other, it is performed by the state.
Either way, it is ultimately the public that pays.
The schoolteacher pays.
The mechanic pays.
The pensioner in a small apartment pays.
Their cost-of-living inches upward: a few extra rupees here,
a few more there. Small enough not to provoke a revolt, but large enough, in
aggregate, to underwrite the infrastructure required to keep the servers
humming.
Diffused costs. Concentrated benefits.
The Ultimate Insult
This brings the collapse of the social contract to its
ultimate, hostile conclusion. Consider the economic lifecycle of this
technology not as an innovation, but as a perfectly modern formula for
extraction.
First, a large company takes the public’s writing, code, and
online activity and uses it for free. People supply the material that trains
the system, but they are not asked for permission and they are not paid.
The company then packages the harvested intelligence and
sells it to the citizen’s employer, explicitly marketing the system as a
mechanism to automate their job and eliminate their wage. It is sold to
managers as optimization, and to shareholders as the long-awaited dream of labour
without workers. The same person whose life was mined for training data now
sits across the table from a polished interface designed to make them
redundant. First, their minds are stolen. Then, its imitation is used to
threaten their livelihood.
And even this is not enough. The machine must be housed,
cooled, and fed. It drinks electricity, gulps water and leans on public
infrastructure with the graceless appetite of an empire. To keep the engine
running, the local utility grid is pushed to its limits. The citizen, already
robbed at the level of information and threatened at the level of employment,
is made to absorb the physical entropy of the machine. They are invoiced for
the destruction of their own environment through higher monthly utility bills.
They are forced to subsidize the metabolism of the system that is destroying
them.
This is not a mutual exchange of value. It is the targeted
destruction of reciprocity itself. The formula is absolute: take without
asking, sell without sharing, displace without remorse, and invoice the
dispossessed for the cost of their own dispossession.
This is not an exchange between equals. It is a one-way
transfer of value. At each stage, the citizen gives and the vendor takes. What
remains is a closed system of extraction, with the gains flowing in one
direction only.
The Death of “Zero Marginal Cost”
For the past forty years, the wealth of Silicon Valley has
been governed by a single, gravity-defying economic principle: zero marginal
cost.
In the physical world, every new product drags a heavy
anchor. A car company spends millions on research, design, and factory tooling
before a single vehicle ever exists. But the spending doesn’t stop there. It
must pay for steel, labour, and assembly time for all the cars. It must pay for
those raw materials for the first, again for the second, and again for the
third. Output rises, but a strict material cost is permanently attached to
every single unit.
Writing a word processor or a database system also cost a
great deal at the start in R&D. It takes engineers, time, and money. But
when the work is done, the next copy costs almost nothing to produce and
distribute. The millionth copy costs little more than the first. After the
company has recovered its development costs, most new revenue becomes profit.
This zero-marginal-cost reality is what made previous
technological revolutions economically sustainable for the broader market. When
tools like Microsoft Office transitioned from luxury productivity multipliers
to mandatory “table stakes” for every corporation, the transition did not
bankrupt the business world. They endured because the software, for all its
power, was fundamentally cheap. The tools were static applications executed
locally on the user’s own hardware, requiring no ongoing thermodynamic effort
from the vendor. The burden of electricity and hardware maintenance was
entirely outsourced to the customer.
Even when the industry shifted from selling local disks to
leasing cloud-based access—the era of Software as a Service (SaaS)—the
underlying math held firm. Companies realized they could charge monthly
subscriptions, treating digital access exactly like a physical utility.
Consider Netflix. Filming a series, licensing movies from studios, and building
the initial server architecture requires massive upfront capital. But once the
infrastructure is set, the cost to deliver that stream to the millionth subscriber
is not huge. The vendor collects recurring monthly revenue, while the marginal
cost of delivering that new customer’s stream remains effectively zero.
The software was a utility, priced like a utility, and it
generated staggering margins simply because it cost nothing to duplicate. That
was the secret inheritance of Silicon Valley’s golden age: an industry built on
products that, once made, no longer had to be made again.
The AI Compute Tax
Generative AI fundamentally violates this zero-marginal-cost
paradigm. What it has done is expose the old fantasy that software is
weightless, frictionless, and free from the burdens of the material world. A
Large Language Model is not a static piece of code resting quietly on a local
hard drive. It is a heavy industrial machine that must work each time you use
it.
When a user types a query into an LLM and presses enter, the
response is not simply retrieved from a database. The model must actively compute
the mathematical probabilities for every single word it generates, in
real-time. This operational process—known in the industry as “inference”—requires
an astonishing amount of computational power. It is not like retrieval of a
thought; it is a manufacturing on demand.
Every single prompt forces rows of high-performance GPUs in
a distant data centre to physically spin up, drawing massive electricity and
generating intense heat. The old belief in software was that it could scale at
almost no extra cost. But LLMs do not work that way. Every paragraph they
produce, every line of code they suggest, and every email they summarize has a
real physical cost.
Silicon Valley has invented a “high marginal cost” software
product. Unlike a word processor, where a million active users cost the vendor
nothing, an LLM vendor must pay for the electricity, the cooling, and the
hardware degradation with every single query. The more a customer uses the
product, the more it physically costs the vendor to keep the lights on.
The Unreachable Break-Even
The financial reality of this compute tax is currently
hidden from the market by a massive, unsustainable subsidy. What is being sold
to the market as an affordable innovation is a beautifully decorated lie.
When an enterprise purchases a generative AI license today,
they typically pay a flat subscription fee of roughly $20 to $30 per user, per
month. To a corporate procurement officer, this feels like a standard,
predictable software-as-a-service (SaaS) contract. It resembles the old
software model, where one more user or one more action costs almost nothing.
But because the vendor is selling a live thermodynamic process rather than a
static digital copy, this flat bill breaks down immediately under actual use.
The second illusion is the API token meter. It is sold to
engineering teams at rates that look tiny: a few dollars for a million tokens.
To someone new to it, a million tokens sounds enormous. It is not. The machine
remembers nothing, so the exact same massive blocks of codebase context must be
sent again with each query. To mask this rapid burn, vendors offer “prompt
caching,” temporarily holding the context in memory at a steep discount. But
this cache expires in five minutes or so. To exploit the discount before the
window shuts, vendors push autonomous coding agents. An agent does not pause to
reason; it loops, generates code, and recompiles at machine speed so that the
prompt cache doesn’t hit its five-minute expiration.
But the API does not have a single meter; it has two. There
is an input meter for what the machine reads, and an output meter for what the
machine generates. Output tokens are expensive because those require GPU
computations. Also, prompt caching works only on the input tokens, never on the
output. So, prompt cache can help in slowing down the input meter, but it
cannot slow the output meter. If there is a mistake in output code then it has
to reread the output tokens along with the compiler errors to update the
context. The enterprise must pay that premium output rate for every
intermediate mistake, every failed compilation, and every hallucinated write.
The sheer velocity of the agent’s automated looping can spin
the expensive output meter out of control. In practice, an agent actively
debugging a complex issue will easily burn the daily quota before lunch,
leaving its customers paying a premium for the machine’s high-speed
incompetence.
Under actual enterprise use, both pricing models—flat rate
and token meter—mathematically break. If an employee leans on the model
continuously throughout the workday—generating code, summarizing massive
document troves, or drafting endless emails—the raw cost of the GPU inference,
electricity, and cooling exceeds the $30 subscription. The customer believes
they are using a product. The vendor, in effect, is underwriting a loss.
To mask this structural flaw and aggressively capture market
share, tech giants are spending billions of dollars in capital to cover the
difference. In effect, they are selling an expensive utility as if it were
cheap software.
But thermodynamics is less sentimental than venture capital.
Vendors cannot continue to swallow the compute tax. Sooner or later, the hidden
bill arrives.
The cheerful subscription model—that little fiction of
affordability—will have to give way to prices that admit what this technology
actually is: expensive, resource-hungry, and structurally incapable of being
offered at mass scale for pocket change. If vendors want to make it sustainable
and make some profit, the monthly price should rise sharply—from $30 to
something painfully higher.
The Price Standoff
Early users of AI may have had an edge. That edge will not
last. As vendors raise prices to cover their real costs, the gap will close.
Once every law firm, agency, and software company has access to the same tool,
it stops being a special advantage. It becomes a basic requirement.
At that point, if AI vendors argue for higher prices, it
might not work. A chief financial officer may approve a large expense for a
tool that gives the company a clear lead over its rivals. But they will not pay
a high monthly fee for something every competitor has and which can significantly
rise their own price points.
This is known as commoditization—the “Dollar Shave Club”
effect. Just as consumers eventually realized a basic razor was “good enough”
and stopped paying a premium for five-blade vibrating handles, businesses will
realize if they can only buy a “good enough” AI to compete on prices.
That leaves the vendor in a hard position. At a low price,
it may not make enough money to survive. At a high price, companies may refuse
to buy. The subscription model then stops working.
The Pricing & Liability Deadlock
In 2022, Cory Doctorow coined the term “enshittification” to
describe a predictable tech business cycle: a company heavily subsidizes a
magical service to attract and lock in users, then deliberately degrades the
quality of that service to extract maximum profit. If the search engine gives
its best results in the first attempt on the top, then the ad revenue will
affect. If they degrade the quality a bit then people need to scroll and ad
impressions will increase. If user thinks their query is not good enough and
tweak it and search again, then there will be even more ad impressions. Search
vendor must make sure the degrade should not be too blatant otherwise users
will go to another search engine. Shopping websites also insert sponsored products
after every couple of items in the product list.
Generative AI is currently exiting the subsidized phase.
Because flat-fee subscriptions cannot cover the massive thermodynamic cost of
heavy enterprise use, vendors are pivoting to usage-based API pricing (the
token meter). Once AI is priced by the token, the enshittification cycle
begins, driven by the physical constraints of GPU memory.
In economics, opportunity cost is what you give up when you
choose one thing instead of another. If you spend Rs. 500/- on pizza, the
opportunity cost is the movie ticket you could have bought with that Rs. 500/-.
For an AI vendor, GPU memory (VRAM) is the most expensive real estate on earth.
When an enterprise uploads a massive codebase, the model stores it in the GPU’s
memory (the KV Cache) to process the request. By offering prompt caching and holding
this massive context consumes premium VRAM that the vendor could be using to
serve dozens of other paying requests. The opportunity cost is high.
To free up this VRAM and protect their margins, vendors are
incentivized to force the model to finish quickly. The models are optimized for
speed over depth, resulting in the “lazy AI” phenomenon—outputting a few lines
of code and a placeholder like “// insert previous logic here.”
When a human developer receives a lazy response with a
placeholder, they recognize the shortcut. They can manually insert that
previous logic there or type a follow-up prompt instructing the AI to “stop
using placeholders and generate the full file.”
Autonomous coding agents do not possess this intuition.
Agents rely on automated file-editing tools to execute their work. If the API
returns a script containing ‘// ... insert previous logic here ...’, the
agent blindly injects that exact text string into the actual source code,
overwriting the real logic.
This is not a temporary software bug that vendors can easily
patch; it is a structural limitation of the technology. Even if a vendor wanted
the machine to output a perfect, complete codebase, they hit a hard physical and
economical wall: the output token ceiling. For AI models, generating tokens are
more expensive than reading them. So, AI models have hard-coded physical limits
on how much they can generate in a single breath. They are mathematically
forced to truncate.
To survive this limit, creators of autonomous agents
invented workarounds. Instead of asking the model to rewrite a whole file, the
agent forces the machine to use a strict “Search and Replace”
format—outputting only the exact lines to find, and the new lines to inject.
But this introduces a new, equally fragile point of failure.
To replace the code, the LLM must perfectly repeat the “Search” block
character-for-character. Because LLMs are probabilistic rather than exact
databases, they sometimes hallucinate a slight variation—an extra space, a
changed indentation, or a tweaked variable name. The agent’s strict editing
script scans the local file, finds zero exact matches, and the automated
workflow crashes.
Whether the model gets lazy with a placeholder, or
hallucinates a broken Search/Replace block, it triggers a highly profitable
loop for the vendor.
1.
The Break: The agent saves the broken code to
the disk and attempts to compile.
2.
The Crash: The compiler hits the missing logic
and throws the error.
3.
The Diagnosis Toll: The agent is programmed to
fix errors automatically. It grabs the compiler’s error log, bundles it
together with the codebase generated by the engine by the previous prompt. Output
tokens are never inside prompt cache. The engine charges for input tokens.
4.
The Repair Toll: The AI generates the new,
corrected code. It charges again for output tokens.
Under the token meter, the enterprise pays for every step of
this loop. You pay the output rate for the initial lazy mistake. You pay the input
rate to upload the error log and context so the machine can diagnose its own
failure. And you pay a third time for the final repair. If the error is logical
in nature, then the unit test will fail, and the meter will loop even more.
The vendor wins twice: they saved expensive compute by
letting the model be lazy, and they generated triple the revenue because the
agent was forced into an iterative loop. Error ceases to be a defect; it
becomes recurring revenue. Accuracy actively works against the vendor’s profit.
If this looping extraction is a slow bleed for text-based
software engineering, it is a financial slaughterhouse for multimedia agents.
Generating text is computationally light compared to a high-definition
video. Calculating temporal consistency, lighting, and physics across millions
of pixels requires enormous VRAM. In fact, the compute cost is so severe that a
perfect, unthrottled render (without lazy AI phenomenon) often costs the vendor
more than the price of the query. To make the prompt economically viable, the
vendor is mathematically forced to speed up the inference. The result is
inevitable: physics break and an actor’s hand melts into a coffee cup.
Professional Hollywood studios and major ad agencies survive
this reality because they use human VFX artists. When the model cuts a corner,
the human loads the flawed clip into traditional software to manually edit the video.
AI vendors are selling enterprises the opposite dream: fully
autonomous marketing agents that generate and finalize campaigns on the fly.
When a coding agent encounters a mistake, it can at least
attempt to use a “Search/Replace” text block to patch a single line. A video
agent has no such luxury. Because the video is generated as one continuous
whole inside the model, you can’t easily fix just one damaged frame without
causing noticeable glitches. If a throttled model hallucinates a physics
anomaly at second fourteen, the agent cannot surgically edit it. It must scrap
the file and force the model to re-render the entire sequence.
Worse, video possesses no objective compiler. An autonomous
marketing AI agent must rely on a secondary, Vision AI agent to act as its
referee. When the Vision AI spots the melting hand, the only available tool is
the brute force re-render. By attempting to cut the human artist out of the
loop to save on salaries, the enterprise walks blindly into the token meter
trap. They are left paying video-generation compute costs for two hallucinating
machines arguing with each other over subjective aesthetics, caught in an
infinitely expensive rendering loop.
Finally, this billing structure weaponizes liability. Traditional
programming languages like C++ or C# are mathematical in nature. If the syntax
is wrong, compiler will reject it. Natural language possesses no such strict
parser and AI prompts are crafted in natural languages.
If a company is billed thousands of dollars for an agent
stuck in a looping cycle of failed queries, the financial dispute becomes
unresolvable. The vendor will blame a “prompt failure,” claiming the agent’s
instructions were poorly formatted. The enterprise will blame an “LLM failure,”
pointing to the model’s lazy inference. Because natural language is subjective,
there is no objective compiler to settle the dispute.
Pay-By-Outcome
If the pay-per-query model descends into a hostile, legal problem,
the vendor might attempt the inverse: Pay-By-Outcome. Instead of charging for
the machine’s thermodynamic effort, the vendor attempts to charge only when the
task is successfully completed—a “closed ticket” model.
In a deterministic software environment, outcomes are binary
and objective. A server is either restored or it remains down. A database query
either returns the records or it fails. But the outputs of generative
AI—marketing copy, legal drafts, strategic summaries—are inherently subjective.
This subjectivity creates a vulnerability for the AI vendor.
If the enterprise client is only billed when they formally accept the final
output, they are instantly incentivized to infinitely reopen the task. Because
they are insulated from the compute tax, they will treat every revision as
free. They will demand minor stylistic adjustments, nuanced tone shifts, or
additional edge-case coverage simply because it costs them nothing to ask. “Make
the tone more professional.” “Adjust this clause to reflect a new hypothetical
risk.”
The client endlessly moves the goalposts on a subjective
task. But the vendor cannot move the physics of the machine. Every single requested revision forces the
vendor to spin up the GPUs, execute substantial
calculations, and burn energy. The vendor bleeds compute money on every
single iteration, effectively subsidizing the client’s indecision until the
profit margin on that single “closed ticket” is completely destroyed.
The Iteration Loophole
To plug this financial leak, vendors will attempt a
compromise: capping the revisions. They will offer a strict quota—perhaps three
free iterations per task—before the meter starts running again. This does not
solve the underlying economic flaw; it merely shifts the burden into the
Iteration Loophole.
Enterprise procurement teams are ruthlessly efficient at
optimizing vendor contracts. If a client knows they are limited to a handful of
free revisions, they will fundamentally change how they interact with the
machine. Instead of requesting simple, iterative adjustments, the user will
cram an overwhelming density of complex criteria into a single, massive prompt.
They will demand the model simultaneously adjust the tone, cross-reference new
frameworks, and rewrite the logic to account for a dozen edge cases—all within that
one “free” turn.
Because the computational cost of a Large Language Model
scales quadratically with the size and complexity of the prompt’s context
window, this density is fatal. The vendor’s hardware must execute heavier
probabilistic calculations to synthesize the bloated input. The vendor is still
forced to burn astronomical amounts of physical energy, bleeding their compute
capital dry to fulfil a contract they cannot renegotiate.
The Inevitable End State
The end is easy to see. The dream of the magic flat fee
cannot last. The machine is too costly. No vendor can offer endless work for
one fixed price and live. In the end, the meter must run. The user must pay for
use. But once the meter runs, the aim changes. The machine no longer serves
truth. It serves revenue. A right answer ends the sale. A wrong answer keeps it
going. Accuracy becomes a loss. Little pitfalls becomes income. So generative
AI in the enterprise will not decay because we cannot make it better. It will
decay because the system pays for decay. In that world, enshittification is not
a flaw. It is the rule.
The Ticking Clock
This pricing deadlock has not gone unnoticed by the broader
financial ecosystem. For the first two years of the generative AI boom, Wall
Street operated almost entirely on faith. Tech giants were handed unconstrained
capital to build out the underlying infrastructure, under the assumption that
an unprecedented technological leap would inevitably forge its own lucrative
business model.
That period of uncritical grace ended in early 2026. Deep
market anxiety has replaced theoretical optimism, driven by the brutal
asymmetry between capital expenditure and realized revenue.
The clearest signal of this rift came from Microsoft. In its
second-quarter earnings, the company reported a massive revenue beat of over
$81 billion. Historically, this would have sent the stock soaring. Instead, the
market focused entirely on a single, terrifying metric: Microsoft’s capital
expenditure had surged 66% to a record $37.5 billion in a single quarter,
almost entirely driven by investments in AI data centres and GPU
infrastructure. The market realized the company was paying an astronomical “AI
tax” to maintain its position, with no clear timeline for a return on
investment. Microsoft’s stock violently crashed 10% in a single session, wiping
out $357 billion in market value despite a highly profitable quarter. Investors
are no longer willing to accept a “spend now, monetize later” narrative without
seeing the math.
The math at the absolute bleeding edge of the industry is
even more alarming. OpenAI presents a financial paradox that defies traditional
business logic. By early 2026, the company achieved an astonishing $20 billion
in annualized revenue. Yet the cost of running the engine is so astronomically
high that the more revenue OpenAI generates, the more money it loses.
After posting a massive $13.5 billion net loss in the first half of 2025 alone, internal projections indicate OpenAI will burn another $14 billion in 2026 just to keep the servers running and the models training. To survive this staggering cash incinerator, the company was forced to secure a historic $110 billion private funding round—the largest in human history—from Amazon, Nvidia, and SoftBank.
This is not a sustainable business. When a company
generating $20 billion in revenue still requires a $110 billion bailout just to
keep the lights on, the market is forced to confront a terrifying reality: the
core product might be fundamentally, mathematically unprofitable. And when a
vendor is draining billions in infrastructure debt, the inevitable result is
the deliberate degradation of the model and the ruthless extraction of the
customer through the token meter.
The Debt Illusion
To mask this terrifying math, publicly listed tech companies
are engaging in a dangerous trick: The Debt Illusion.
Historically, when a corporation embarks on a generational infrastructure
build-out, it funds the expansion through its free cash flow. If the
expenditure is massive, then the responsible board will pause or cut
shareholder payouts—like stock buybacks and quarterly dividends—to cover the
cost. But the current AI frenzy does not allow for such prudence.
Faced with astronomical bills for data centres and cooling
infrastructure, executives find themselves trapped. They desperately need the
cash, but they are terrified to cut their quarterly dividends. To Wall Street,
a slashed dividend is a blood-in-the-water signal—an admission that the core,
legacy business is stalling. In the current hyper-anxious market, cutting the
dividend to pay for AI would trigger an immediate, violent stock sell-off.
To solve this, corporate boards have chosen a radical middle
path. To keep shareholders pacified while buying tens of billions of dollars
worth of GPUs, they are turning to the bond market. They are maintaining—and in
some cases, initiating—lucrative dividend payouts to project an aura of
invincible financial health. At the same time, they are issuing massive amounts
of corporate debt to actually pay for the AI infrastructure.
The balance sheet creates a mirage. On paper, the company
looks endlessly profitable, returning cash to investors. In reality, they are
borrowing billions of dollars to buy highly specialized, rapidly depreciating
hardware, purely out of fear that the stock market will punish them if they
stop the music. When tech companies built out the cloud and mobile ecosystems
in the 2010s, interest rates were effectively zero. Debt was free. Today,
borrowing tens of billions of dollars on the bond market carries massive,
compounding interest costs. They are mortgaging their future balance sheets to
fund a thermodynamic engine that so far does not possess a sustainable business
model.
Circular Revenue
To further artificially sustain this demand, the industry
has turned to pure financial alchemy: Circular Revenue.
The undisputed kingmaker of the AI boom, Nvidia, currently
sits on unprecedented piles of cash. But rather than relying solely on organic
enterprise demand to sell its hyper-expensive GPUs, the company has
aggressively deployed its capital directly into a extensive portfolio of AI
startups and boutique cloud providers.
The mechanics of these deals are unexpectedly circular.
Nvidia injects money into a young AI firm. That firm then immediately turns
around and uses that money to buy thousands of Nvidia GPUs. On Nvidia’s
quarterly earnings report, this transaction is recorded as high-margin revenue.
This instantly signals to Wall Street that market demand for silicon is
infinite, further fuelling Nvidia’s multi-trillion-dollar valuation.
In reality, the hardware vendor is effectively financing its
own customers. It is a closed loop of vendor financing, designed to
artificially prop up the order book and prevent the demand bubble from popping.
This arrangement assumes that these AI startups will
eventually find actual, paying end-users to justify the hardware. But corporate
boards do not want to keep paying for expensive AI compute, and everyday
consumers are refusing to pay premium subscriptions for probabilistic tools. If
these startups fail to generate real software profits, they will collapse under
the weight of their own operating costs. When they default, the circular revenue
machine breaks.
The market will suddenly discover that a massive percentage
of the “historic demand” for AI hardware was simply Silicon Valley passing its
own money back and forth in a circle.
But the fallout will not end with dried-up order books. When
these startups inevitably liquidate, their physical assets will not evaporate.
Thousands of lightly used, flagship GPUs will flood the secondary market at
fire-sale prices. The hardware giants will not just lose their future buyers;
they will suddenly find themselves in a brutal price war against their own
ghosts.
The “Moore’s Law” Fallacy & The Thermodynamic Wall
The entire financial house of cards—the subsidized
subscriptions, the debt illusion, and the circular revenue—is balanced on a
single, desperate assumption: that the hardware will eventually rescue the
software.
Silicon Valley operates on the residual faith of Moore’s
Law. Coined in 1965, this is the foundational observation that the number of
transistors packed onto a microchip doubles roughly every two years. For half a
century, the tech industry rode this uninterrupted, magical trajectory. Because
engineers could continually shrink these microscopic switches, computing power
doubled while production costs halved. Chips became exponentially faster,
cheaper, and more energy-efficient. Today, tech optimists assume this
historical curve will naturally extend to generative AI, believing that
continuous R&D will inevitably drive the cost of running a Large Language
Model down to near-zero.
This optimism ignores fundamental physics. Moore’s Law is
not a law of nature; it was an economic observation that is now colliding with
a thermodynamic wall.
We are no longer shrinking bulky silicon components; chip
manufacturing has reached the atomic scale. When transistors are reduced to the
width of a few atoms, they fall victim to “quantum tunnelling”—a phenomenon
where physical barriers become so thin that electrons simply bleed right
through solid matter, generating uncontrollable heat. We have extracted the
final efficiencies from the silicon substrate. The era of free, exponential
performance gains simply by shrinking the hardware is over.
The catastrophic bottleneck for AI is no longer just
processing speed; it is the physical act of moving the data. This is known in
computer science as the “Memory Wall.”
A Large Language Model is essentially a representation of enormous
matrix of weights and parameters. To generate a single word of text or a single
line of C++, the GPU cannot just perform a calculation; it must physically
fetch terabytes of data from the memory chips and push it into the GPU processing
cores.
The PCIe Thermodynamic Cliff
To understand how rapidly this physical limit is
approaching, one must look at the central nervous system of the modern server:
the Peripheral Component Interconnect Express, or PCIe.
If the GPU is the calculating brain of AI, the PCIe bus is
the physical highway that connects that brain to the memory banks and the
network. To process the data in the GPU cores, terabytes of data must first be
moved from memory to the GPU. If the PCIe highway is too slow, the expensive
GPUs will sit idle, waiting for data to arrive to start the processing. GPUs
are expensive which makes the opportunity cost higher. To prevent this
bottleneck, the hardware industry aims to double the speed of this PCIe highway
every few years.
Currently, most PCIe devices are either PCIe Generation 4 or
Generation 5. With the transition from PCIe 5.0 to 6.0, engineers executed a
brilliant, one-time structural trick to achieve this doubling. If we strip away
the technical complexity, the underlying technology earlier used two voltage
levels to represent 0 and 1. Now, in PCIe 6.0, they increased the levels to
four: representing 00, 01, 10, and 11. In effect, one electrical pulse can send
two bits at a time, which effectively doubles the overall speed. They call this
new technique as PAM4.
The second law of thermodynamics states that entropy always
increases. There is a natural state of the universe, and to deviate from it
requires energy. Boiling water will naturally cool down to room temperature. To
keep it warmer or cooler than the room temperature, you must constantly spend
energy either via burning a stove or running a refrigerator. The further you
push a system from its natural resting state, the more energy you must inject
into the equation.
When we move from two voltage levels to four, the gap
between voltage levels also shrink. A tiny fluctuation in voltage or a bit of
electrical noise can easily cause a 00 to become a 01 at the other end. If the
system relied on standard algorithms to catch and correct these errors, you
need to first read the entire thing, then check for the errors and then ask for
a retransmission. The time taken by the algorithm would instantly erase the doubled-up
speed the engineers were trying to achieve. So, the hardware industry deployed
a technique called Forward Error Correction (FEC), which relies on highly
specialized, miniature processors built directly into the silicon. They detect
and correct the errors on the fly during the transmission itself. But this technique
draws a massive amount of electrical current, and the computation produces
additional heat that must be cooled down.
Furthermore, a moving electron produces a magnetic field. A
motherboard does not have just one copper trace; it has hundreds of them packed
less than a millimetre apart. At the high speeds of PCIe 6.0, these traces act
like microscopic radio antennas. If you pump a massive, high-voltage signal
into one trace, the electromagnetic field it generates can leak into the
adjacent traces. This is called “crosstalk.” To avoid this, engineers have no
choice but to operate at lower voltages. However, lower signal strength creates
a new problem where the signal can become too weak and fail to reach its
destination. To fix this, engineers are forced to deploy power-hungry “Retimer”
chips at regular intervals simply to catch the dying signal, clean it, and
re-amplify it. Doing all this extra work requires immense energy and produces
massive heat.
To cool this down, our typical air-conditioned server rooms
and exhaust fans attached to devices are no longer sufficient. Air simply
cannot carry away this level of concentrated thermal density. The facility is
forced to plumb direct-to-chip liquid cooling systems—circulating industrial
fluids millimetres away from the processors so that they don’t melt.
The cliff is getting steeper. With the arrival of the PCIe
7.0 standard in 2025, the industry must double the speed again. They did not try
to invent a PAM8 system because the crosstalk and noise would become way too
high. They have decided to brute-force physics, driving the raw frequency of
the motherboard to an extreme—simply put running the same PAM4, but at a higher
speed. Pushing an electrical signal at this high speed (32 GHz) across copper
is an engineering nightmare. The cost of retimers and liquid cooling will skyrocket
because the extreme heat will be simply unavoidable.
There are also talks about ditching copper and moving
towards optical fibre—the same technology that powers high-speed internet. Instead
of pushing electrons through metal, the system converts data into pulses of
light, firing them down microscopic glass cables where they bounce off the
internal walls like a hall of mirrors. Optical is great for working at a longer
range but moving it to short range will have its own challenges. Furthermore,
GPUs need electrical signals, so PCIe 7.0 must convert optical to electrical
signals and vice versa for back-and-forth communication. These E-O converters
change the heat profile on the motherboard. With copper, the entire board was
hot, but now the heat is concentrated directly at these converters. Running
PCIe 7.0 on glass can make the thermal architecture better than PCIe 7.0 on
copper, but the heat output is still higher than PCIe 6.0. Also, these physical
benefits must be paid for with money. Copper infrastructure is cheaper than
glass infrastructure. If the infrastructure goes to glass, the AI vendor has to
charge a higher fee.
Cornered by these physical and thermal limitations, many
believers argue that focusing on the current hardware limits is short-sighted;
they assume that whenever a physical wall is reached, a brilliant new
technological paradigm will simply bypass it. But this optimism frequently
conflates logical breakthroughs with physical realities, or confuses science
fiction with the immediate financial present.
When looking for immediate salvation, new communication
standards, most notably Compute Express Link (CXL), marketed as a breakthrough.
But CXL is not magical. It rides directly on top of the exact same PCIe
physical layer. So, there is no escape with this one.
To overcome the limits of motherboards, there is a talk
about 3D chip stacking. The theory is that if we cannot spread chips out
because of the copper, we can stack them vertically-like building tiny skyscrapers
on motherboards. But thermal density will still be high. Stacking processing
and memory vertically traps the heat deep inside the block, making the chips
even more difficult to cool without extreme, expensive liquid immersion. The
infra cost will shoot.
Finally, about Quantum Computing. While this is fascinating
field of foundational research, it is decades away from running commercial AI
workloads, if it ever does at all. It offers absolutely no salvation for
generative AI data centres being constructed today. The financial debt clock is
ticking now. Wall Street, corporate boards, and the global supply chain cannot
pay this year’s massive infrastructure bills with the theoretical breakthroughs
of the distant future. The industry is trapped in the present, forced to fight
the uncompromising physics of the hardware they actually have.
The GPU Cliff
Tech optimists might concede that the PCIe highway is
bottlenecked, but they assume the destination itself—the GPU—will continue to
scale in raw calculating power. But the calculating brain of the AI is slamming
into a physical cliff of its own.
For the past decade, the simplest way to make a GPU faster
was to make the physical silicon chip larger so it could hold more processing
cores. That era is over. The multi-million-dollar lithography machines that
manufacture these chips are bound by a strict optical constraint: the reticle
limit. This is the maximum physical size of the machine’s lens window, which
caps out at roughly 800 square millimetres. Because of this hard boundary, it
is physically impossible to print a single, monolithic chip larger than this
window. The industry’s flagship GPUs have already hit this wall.
Because they cannot make the single chip any bigger,
engineers are forced to stitch multiple smaller pieces of silicon—called “chiplets”—together
on a shared microscopic baseplate. To make two chiplets act as a single brain,
they must communicate with each other across microscopic copper bridges. The
industry essentially took the time-tested technology of copper interconnects
and shrank it down to fit inside the chip. By doing so, they took the exact
same thermodynamic nightmare of the motherboard and trapped it deep inside the
processor package itself.
The power of those chips comes from their transistors. A
transistor is a microscopic circuit etched into the silicon. It has only one
job: to open or close, controlling the flow of electricity to create the 1s and
0s of digital math. A flagship AI GPU contains tens of billions of these
microscopic switches. If we cannot fit more physical switches onto a maxed-out
chip, the only option is to force the existing switches to flip faster. This
requires blasting them with a massive surge of high-voltage electricity. Just a
few years ago, a high-end data centre GPU consumed 300 watts of power. Today,
the latest flagship AI processors are designed to draw an apocalyptic 1000 to 1200
watts for a single chip.
Squeezing any further on-chip performance out of this
architecture will cause manufacturing, packaging, and cooling costs to
skyrocket. The compute engine has maxed out its size and pushed its electrical
current to the melting point. This compounding hardware debt mathematically
cannot be paid for under the AI industry’s current pricing model.
The Upgrade Debt Cycle
The final, crushing weight of this physical reality
manifests in the Upgrade Debt Cycle.
In a traditional IT world, hardware is a predictable
investment. A company buys a standard server, plugs it in, and comfortably
spreads the cost over a four-to-six-year lifespan. Generative AI infrastructure
does not afford this luxury.
The speed of oldness in this industry is violent. To
understand the financial trap, one only needs to look at the
multi-billion-dollar hardware graveyard created over just the last six years.
In 2020, OpenAI sparked the current arms race by training GPT-3 on a massive,
custom-built cluster of Nvidia V100 chips. By late 2022, when the world was marvelling
at the launch of ChatGPT and GPT-4, those V100s were already functionally
obsolete fossils. To train that new generation of models, the industry pivot to
tens of thousands of A100 chips, costing hundreds of millions of dollars.
By 2024, the cycle repeated. To build models like Meta’s
Llama 3, Google’s Gemini 1.5, and the GPT-4 Turbo class, AI companies
panic-bought tens of billions of dollars’ worth of the next iteration, the
H100. Today, a mere 18 to 24 months later, those massive H100 stockpiles are
already physically incapable of training the bleeding-edge frontier models. The
older chips lack the memory and bandwidth demands for the next algorithms.
This creates a devastating economic reality: without ever
making one model profitable, the industry is forced to move on to building the
next model, using entirely new hardware. Before ChatGPT could ever recover the
hundreds of millions spent on its A100 servers, the AI companies had to spend
tens of billions on H100s. Before those H100s could break even, they were
forced to buy the next generation of liquid-cooled racks. The product is never
finished, and the hardware never pays for itself.
Furthermore, an “upgrade” is not simple—like opening a
computer case and snapping a new graphics card into a motherboard. The entire
environment must be re-engineered for power requirements, water and cooling
requirements, the air conditioning requirements.
Because of this extreme physical footprint, AI companies are
frequently forced to abandon their old data centres entirely and build new ones
from scratch.
Huge, one-time hardware investments (Capital Expenditure)
have effectively mutated into an astronomical, recurring subscription bill
(Operating Expense). AI vendors are borrowing billions of dollars to buy hot,
power-hungry engines that become obsolete before they could ever break even,
creating a perpetual cycle of debt simply to stand still.
UBI and the Missing Consumer
To grasp the final flaw in the generative AI narrative, one
must temporarily suspend the disbelief. Assume, for a moment, that Silicon
Valley achieves the impossible. Imagine they solve the thermodynamic cliff,
bypass the memory limits, and drive the physical cost of computing down to near
zero. Imagine the probabilistic engine functions perfectly, and the global
enterprise successfully deploys it to completely automate the cognitive
workforce.
If the technology works exactly as promised, it triggers a
macroeconomic paradox that destroys the very corporate profits it was built to
maximize.
Enterprise adoption of AI is driven by a myopic, localized
obsession with the supply side of the economic equation: reducing the cost of
production by liquidating human labour. But a functioning capitalist economy is
a closed loop. It requires a heavy, physical equilibrium between production and
consumption. By ruthlessly optimizing the supply side, the generative AI model
systematically annihilates the demand side.
The target of this automation is not the factory floor. It
is the cognitive workforce—software engineers, data analysts, administrators,
writers, and legal professionals. Economically, this demographic is the
load-bearing pillar of the global middle and upper-middle class. They are the
primary engine of consumer spending, holding the disposable income required to
purchase homes, vehicles, services, and the very software these automated
corporations produce.
The corporation achieves infinite, frictionless production,
only to discover it is selling into a macroeconomic void. The profit margin
becomes mathematically perfect, but the revenue drops to zero.
UBI as Rent-Seeking
Silicon Valley is aware of this macroeconomic paradox. Their proposed solution is the Universal Basic Income (UBI).
To the casual observer, tech billionaires championing wealth
redistribution appears radically progressive, even altruistic. Through the lens
of cold economics, however, it is the ultimate rent-seeking manoeuvre.
If generative AI successfully automates the middle class, it
breaks the traditional cycle where wages fund consumption. To prevent the
economy from collapsing—and to ensure their own revenues do not drop to
zero—the AI companies desperately need a third party to step in and fund the
consumer. That third party is the state.
Under the tech industry’s vision of UBI, the government
taxes the broader economy—or simply monetizes national debt—to distribute
monthly stipends to the displaced workforce. The citizens are then expected to
use this taxpayer-funded stipend to purchase the software subscriptions,
digital content, and algorithmic services produced by the very AI companies
that eliminated their wages in the first place.
UBI, in this context, is not a utopian safety net for the
working class; it is a massive, state-sponsored bailout for AI vendors. It
socializes the cost of maintaining consumer demand while keeping the profits of
automated production strictly privatized. It is the final, perfect closed loop
of corporate extraction.
The Inflation Trap
Money is not wealth; it is a claim on resources—physical or human
labour.
A government cannot fund a massive UBI program without
taxation. If AI successfully automates the cognitive workforce, it
simultaneously destroys the very population that pays the taxes. To provide the
stipends, the state is forced into aggressive deficit spending. It must simply
print the money, expanding the fiat currency supply.
When the government distributes newly printed UBI cheques to
an unemployed population, that money will desperately try to buy physical
resources. Since everyone has the exact same amount of money, the prices will
adjust for this new reality. If everyone is printed the same income, no one has
any real purchasing power. The rents, food prices, and energy prices will
rapidly rise to absorb the new arrival of cash.
As prices adjust, the UBI stipend will not be able to maintain
the consumer’s standard of living; it will debase the currency. Within a few
economic cycles, the monthly cheque will become worthless. Consumers are left
with zero disposable income. The tech monopolies ultimately realize they cannot
sell software subscriptions, digital services, or consumer goods to a
population whose state-mandated income barely covers groceries and the
electrical bill. The inflation trap snaps shut, bankrupting the consumer and
the corporation alike.
But the collapse will not stop at corporate bankruptcy. When
mass unemployment ceases to be an anomaly and becomes a permanent structural
feature, the foundational contract between the citizen and the government
disintegrates. You cannot maintain civil order—or even walk safely down a
street—in a society where one in every three people is driven to physical
desperation.
Furthermore, this violence will erupt exactly as the state
loses its capacity to contain it. The same evaporating tax base that forced the
government to print the UBI cheques also starves the civic infrastructure.
Without the tax revenue of the automated middle class, the government cannot
fund the courts, maintain the public grid, or pay the police force. The social
order simply collapses.
UBI won’t be needed
The UBI scenario outlined above serves primarily to
highlight the sheer desperation and moral bankruptcy of the AI industry. It
exposes the epic greed of a sector that wants to capture all the profits while
forcing the state to subsidize the consumer. Generative AI may not survive in
its current form, but it will survive, nonetheless. A UBI will not be needed
for its survival in any case.
As the second law of thermodynamics says—keeping a system
far from its natural order requires a continuous injection of energy. If this
technology survives, its massive physical footprint will force the creation of
entirely new industries just to feed its hunger. An automated taxi may replace
one human driver, but its physical demands escalate to an entirely new level:
the number of cameras it requires, the high-speed internet infrastructure, the
continuous navigation uplinks, and the constellation of satellites needed just
to support its routing.
Similarly, we will see booms in water recycling, industrial
cooling systems, and nuclear energy. The mining sector will explode simply to
meet the sheer demand for raw materials. Each of these sectors will open their
own capillaries. Because generative AI is the output, it cannot feed its own
inputs. The algorithm cannot mine its own copper and still remain profitable.
It cannot pour the concrete for its own data centres, nor can it build its own nuclear
reactors.
Before the invention of the computer and the internet, the
Information Technology sector did not exist. The AI will forge entirely new,
massive sectors of human labour. The complexity of these new age systems will
be much high.
The labour market will correct itself. The prestige of the
software engineer might fade. A new generation of students, watching the rapid
commoditization of code, will no longer flock to computer science degrees.
They might follow the capital. The brightest minds will
pivot toward those sectors booming under the new AI industry. Software sector will
have lesser engineer to meet its own lesser demand.
The correction will also not be overnight. Humans have built
sophisticated enterprise systems—from operating systems to SAP to stock markets.
AI in near future cannot take that role. It can write good python code, but
python code is slower than compiled code like C and C++. Most of python
packages are anyways written in C/C++ and its runtime environment is also
written in C. C/C++ has lots of hardware dependencies, compiler options, and
raw memory management that AI cannot take over everything next month. The
transition will take enough time. Still there will be millions of displaced
knowledge workers scramble to find their footing in a market that no longer
pays a premium for their cognitive output. But the apocalyptic vision of a
permanently unemployed human race sitting idle on a state-sponsored UBI is way
too pessimistic.
People will always find something to do.
The AGI Illusion and the Rise of SLMs
When an AI company realizes that its core product is
mathematically unprofitable, it has two choices: admit defeat and face the
wrath of the market or change the promise. Facing the devastating economic,
thermodynamic, and macroeconomic realities of Large Language Models, Silicon
Valley has chosen the second option. They have executed the narrative pivot.
To justify the continued burning of hundreds of billions of
dollars, the industry can no longer sell the engine as just another productivity
tool. The ROI is demonstrably false. Instead, they must sell a prophecy. They
are now arguing that the current probabilistic engines are simply the
necessary, expensive stepping stones to Artificial General Intelligence (AGI)—a
theoretical, omniscient machine that will eventually become so smart it will
solve its own economic and physical limitations.
This utopian narrative, however, is built on a profound
architectural fallacy. The tech companies are selling the promise of AGI, but
they are building LLMs, and the two are not mathematically sequential.
As prominent AI researchers and cognitive scientists—most
notably Meta’s Chief AI Scientist Yann LeCun—have repeatedly warned, AGI will
not emerge simply by throwing more hardware at a Large Language Model. An LLM
is, at its absolute core, an exceptionally sophisticated next-token
autocomplete engine. It does not possess a deterministic “world model.” It does
not actually understand the physical gravity of a falling object, the rigid,
logical constraints of a C++ codebase, or the chronological reality of a
historical event. It calculates the statistical probability that one specific
word should follow another, based on the static data it has already ingested.
Scaling up an architecture does not change its fundamental
nature. Feeding an LLM a trillion more parameters and pushing the data centres
to the edge of a thermodynamic meltdown does not magically transform a
probabilistic engine into a deterministic thinker. It will simply create a much
more articulate, convincing talker. You cannot achieve generalized, reasoning
intelligence by infinitely scaling a system that fundamentally lacks the
architecture for logic.
The Silent Retreat to SLMs
Despite the public megaphones blasting promises of
omniscient AGI, the actual engineering behaviour of the AI companies tells a
completely different, far more pragmatic story. Behind the closed doors of
their R&D labs, the industry is executing a quiet, desperate pivot away
from the God Model.
They are retreating to Small Language Models (SLMs). SLMs
represent the true, sustainable future of enterprise artificial intelligence.
Instead of training colossal, trillion-parameter Goliaths
designed to know everything about quantum physics, historical poetry, and
JavaScript, engineers are building drastically compressed, highly specialized transformers.
These models are trained on narrow, meticulously curated datasets. An
enterprise does not need an omniscient oracle; it needs a model that perfectly
understands the exact domain the business operates in. A law firm needs a model
strictly trained on case law, regulatory precedents, and contract structures.
It has absolutely no economic need for its legal AI to know how to write C++
code or the history of Roman Empire.
This pivot to SLMs is not driven by scientific curiosity; it
is a masterstroke of economic survival.
Because these models are small, they do not require a
billion-dollar, liquid-cooled data centre to run. They are designed for the “edge”—meaning
they can run locally on a standard corporate laptop, an office enterprise
server, the on prem data centre, or even a smartphone. By shrinking the model
enough to fit on local hardware, the AI vendor brilliantly executes an escape
from the compute tax.
When an employee runs an SLM on their company-issued laptop,
the vendor’s massive servers do not spin up. The vendor burns no data centre
electricity and degrades no expensive GPUs. The enterprise client pays the
local power bill, buys the local hardware, and absorbs the thermodynamic heat.
The AI company successfully offloads the physical cost of computation back onto
the consumer, returning exactly to the “zero marginal cost” software paradigm
that made Silicon Valley rich in the first place.
In this pragmatic, localized future, what happens to the
massive, generalized models—the ChatGPTs and Geminis of the world? They may not
die, but they are brutally demoted.
Using a trillion-parameter supercomputer to draft a polite
HR email, summarize a PDF, or write a basic Python script is a thermodynamic
absurdity. As SLMs take over these daily, mundane tasks, the general LLM will
be evicted from daily enterprise workflows.
Instead, the massive models will be pushed into one of two
narrow roles. First, they may become a highly expensive “Luxury Oracle,” locked
behind a massive premium paywall, used only by researchers who genuinely need
cross-domain synthesis. Masses will use the fast model which will cost pennies
for AI companies to run. These models will hallucinate a lot, and everyone will
develop a habit to verify its output.
However, this pragmatic, localized future carries a
devastating final irony for Wall Street.
If the true economic value of AI is ultimately delivered by
tiny, efficient, specialized models running locally on corporate laptops, then
the sprawling, $50 billion nuclear-powered data centres currently being
constructed to house the omniscient “God Models” are entirely unnecessary.
The tech companies will have successfully stabilized their
software margins, but only by admitting that their massive hardware buildout
was a catastrophic miscalculation. The infrastructure boom will collapse,
leaving behind the massive, empty concrete factories and liquid-cooling
pipelines as the most expensive stranded assets in corporate history.
The Human Edge Reclaimed
This pragmatic retreat to the local, air-gapped SLM
fundamentally rewrites the economic endgame of artificial intelligence. It
averts the apocalyptic automation of the cognitive workforce and, in doing so,
definitively reclaims the human edge.
When an enterprise deploys an SLM entirely on local hardware
or on-prem data centre, it guarantees absolute data sovereignty. But more
importantly, the physical constraints of the SLM move the AI to its proper,
highly effective role: a sophisticated industrial power tool.
SLM is not an autonomous worker; it is the “bicycle for the
mind” as Steve Jobs would call it. The human engineer will hold the
deterministic “mental map”, the logic, architectural design, and ultimate
accountability.
This is the true, sustainable future of the AI revolution.
The technology does not replace the human mind; it simply removes the friction
from human output. The enterprise achieves genuine, measurable productivity
gains without paying a crippling compute tax.
The only losers in this pragmatic, localized future are the current
LLM companies and the hardware vendors. By reclaiming the human edge, the
market permanently destroys the justification for the trillion-dollar
valuations, the thermal debt, and the massive, unsustainable AI infrastructure
boom. The revolution will not be a centralized, omniscient supercomputer; it
will simply be a faster, quieter set of tools in the hands of the human
workforce.
A God Model
“You kids
have no idea,” the old man said, watching his grandson carefully monitor the
token-meter on his tablet. “You complain about the price of cognitive bandwidth
today, but you missed the Golden Age of the Subsidy. You missed the
madness of those years.”
The kid
rolled his eyes. “I know, Grandpa. You guys had the ‘God Models.’”
“We didn’t
just have them,” the old man laughed, leaning back in his chair. “We treated
them like absolute garbage. For about four years in the mid-2020s, Silicon
Valley completely lost its mind. They built the most expensive supercomputers
in history. And they gave us the keys for twenty bucks a month.”
The
grandson looked up, skeptical. “Twenty bucks a month? Per query?”
“No. Flat
rate. All you could eat. It was a venture-capital charity program, and we
abused it magnificently.”
“And what
did you do with them?”
“I made it
write my grocery list as a Shakespeare sonnet.”

No comments:
Post a Comment