Skip to content

半导体

2026-03-18 The next phase of artificial intelligence may require very different processors

Training and inference place different demands on hardware.

Training, in which an AI model is taught to identify patterns in vast amounts of raw data, relies on enormous numbers of calculations being conducted in parallel. Nvidia’s B200 chip, for instance, one of the company’s flagship products, contains more than 16,000 processing units, also known as cores, to perform such operations.

Inference, in which a finished model calls on its training to respond to user prompts, works differently. It unfolds in two stages: prefill and decode.

  • During prefill, the model processes the prompt and converts it into small units of text, typically about four characters in English, known as tokens. To speed things up, tokenising different parts of the query can be done in parallel.
  • Decoding then generates the response, token by token. To do this, the model relies on its “weights” (relationships between tokens learned during training) as well as previously generated tokens. These weights are stored in the system’s memory.

The need for constant memory access is where modern GPUs fall down. AI processors like the B200 contain small but extremely fast on-chip memory, known as SRAM, as well as a much larger off-chip memory known as DRAM. Accessing DRAM can be ten times slower and consume far more energy than reading SRAM. The problem is worsening. As AI models grow larger and become better at handling long user prompts, their memory demands are rising sharply. A study by Amir Gholami of the University of California, Berkeley, and colleagues finds that over the past two decades computing performance has roughly tripled every few years, whereas off-chip memory bandwidth has improved by a factor of only about 1.6. This “memory wall” has become the main bottleneck in increasing the speed of AI inference.

GPUs rely on software workarounds to cope.

  • One approach splits the two stages across different processors.The prefill phase runs on GPUs optimised for high parallel computing power, while decoding runs on separate GPUs designed for fast memory access.
  • Another technique is batching, where many queries are processed together. Once the model’s weights are loaded, they can then be used for many queries at the same time, reducing repeated trips to the external memory.

Nvidia’s new chip uses the power of software to give the on-chip memory a boost. The size of the SRAM is around 500 megabytes—tiny when compared with the B200’s 192 gigabytes of off-chip memory. What makes the difference is smart software that choreographs how every piece of data moves through the chip to maximise computation and memory access.

One approach is to simply build a bigger chip. That is the approach taken by Cerebras, an American chip designer. Its latest chip, the size of a dinner plate, contains an enormous 900,000 cores and 44 gigabytes of on-chip SRAM. Because all data movement occurs within the wafer, Cerebras claims its system can run inference up to 15 times faster than conventional designs. For very large models, however, storing all their parameters on SRAM is impractical.

Others are tackling the problem by redesigning how data move through the cores. MatX, a startup founded by former Google chip engineers, builds on an idea used in Google’s tensor processing units (TPUs). These chips rely on what is called a systolic array, a grid of processing elements through which data flow rhythmically, rather like blood pumped through the body. After each calculation the result passes directly to the next unit, bypassing the need to store intermediate results in memory. Traditional systolic arrays, however, are fixed in size. Make them bigger, for larger tasks, and they will often sit idle; make them smaller, and efficiency falls when the larger tasks come through. MatX proposes a “splittable” systolic array that divides the processor into several smaller grids, allocating computing resources differently depending on whether the chip is handling prefill or decode.

A third approach, pursued by d-Matrix, a California-based startup, tries to eliminate the memory wall entirely by having the same components handle both memory and computation. This architecture, known as in-memory computing, promises lower energy use and faster inference.

Others advocate chip designs built around specific algorithms to improve efficiency further. Etched, another Californian startup, is designing a chip custom-built to run transformer models, the algorithms that underpin most LLMs. This specialisation allows the company to strip away hardware needed for other uses and simplifies the software running on the chip. Researchers in China have proposed an even more radical form of specialisation: embedding model weights directly into hardware. In one design from the Chinese Academy of Sciences, these are physically encoded in the layout of metal wires. The authors claim this technique removes the need to fetch parameters from memory, enabling extreme efficiency.

Yet such specialisation carries risks. Designing a new chip typically takes 12–18 months, whereas AI algorithms evolve far faster. A chip built around today’s dominant model architecture could quickly become obsolete if the field shifts.

The chips have yet to fall. Nvidia’s rivals are at different stages. Cerebras is already on its third generation of chips; d-Matrix expects to release its first widely available version this year. Others, including MatX and Etched, remain in development. Nvidia says the Groq 3 LPX will reach the market later this year.

It is easy to see that the GPU conquered training. Inferring what comes next is harder.

2026-03-17 Nvidia is expanding its empire

The transformation is needed partly because Nvidia’s success has attracted competitors. Some are conventional rivals, such as AMD, an American chipmaker that has released decent alternatives to Nvidia’s GPUs. Others are startups spying opportunities. New chip designs are become commercially viable because the need for inference (AI models answering queries) is growing, and the process places a different set of demands on chips from training. According to PitchBook, a data firm, young chip firms raised $17bn in 2025, more than in the previous two years combined.

In the latest financial year just three of these hyperscalers accounted for over half of Nvidia’s receivables, money owed but not yet paid.

Bernstein, a broker, says local suppliers such as Huawei, Cambricon(寒武纪) and MetaX(沐曦) could grow from less than a fifth of China’s AI-chip market in 2023 to more than nine-tenths by 2027.

In December Nvidia paid $20bn to license technology and hire engineers from Groq, a startup specialising in inference chips. On March 16th the company unveiled a new chip using the startup’s knowhow.

Nvidia is also investing in other layers. As AI systems scale, moving data between processors has become as important as the processors themselves. The firm is betting heavily on networking equipment, the technology that links chips together. In its most recent quarter this business generated $11bn in revenue, making Nvidia one of the largest players in the field.

Nvidia has released several families of open-source AI models. These are specialised and aimed at specific industries. That includes Alpamayo for self-driving cars, GR00T for robotics and BioNeMo for biomedical research. They often rank highly on open-source AI leaderboards. Nvidia plans to invest billions to expand its capabilities in this layer of the stack.

Revenue from sovereign AI tripled last fiscal year to more than $30bn, about 15% of Nvidia’s AI sales.

The company is also trying to rely less on the hyperscalers that dominate its customer list. One approach is to push deeper into industry. In carmaking, Mercedes-Benz will soon ship vehicles equipped with Nvidia’s self-driving systems. In pharmaceuticals, Eli Lilly uses Nvidia’s infrastructure and models to accelerate drug discovery. Dion Harris, an Nvidia executive, says the aim is to work more closely with end customers, such as Lilly and Mercedes, to understand their needs and shape the next wave of AI. But Nvidia is not the only one to say it is working closely with clients. Such moves put the firm on a collision course with the hyperscalers, which offer similar services.

Another approach is to create demand through its investments. Nvidia-backed firms, the idea goes, are more likely to buy its chips. Thus the firm is now one of Silicon Valley’s most prolific investors. Since 2020 it has made some 200 investments, committing over $65bn (see chart 2). That includes such big bets as a $30bn investment in OpenAI, and small ones on firms in robotics, software and AI applications.

The firm’s investments also help to secure its supply chain. This March Nvidia put more than $4bn into companies developing optical interconnects, which use light to transfer data rather than wires. Most AI data centres still rely on copper cables to link their equipment.

Nvidia is using its cash pile to strengthen other parts of its supply chain. The semiconductor industry is prone to shortages when demand surges. Supplies of advanced memory—critical for AI chips—are already sold out for this year and for much of next. Nvidia bought most of the memory it will need this year, and part of next, well in advance.

2026-02-12 Arm wants a bigger slice of the chip business

IN THE SEMICONDUCTOR industry, Arm is everywhere and nowhere. Designs from the British-based, American-listed, Japanese-controlled firm sit in almost all the world’s smartphones and most other connected devices. Yet Arm does not sell a single chip. Customers license its designs, tweak them if they wish and produce the chips themselves (or have them made). Arm pockets an upfront licence fee and a slim per-chip royalty. The model has made it ubiquitous. More than 300bn chips built on its designs have been shipped—over 30bn of them last year alone.

Weak demand for smartphones and consumer electronics has weighed on Arm’s shares: since the start of 2025 their price has declined by 2%, even as the benchmark Philadelphia semiconductor index, fuelled by enthusiasm for artificial intelligence, has climbed by 65% (see chart).

Designing a new CPU can cost hundreds of millions of dollars and take 12-18 months. An off-the-shelf blueprint spares customers, such as Apple, much of that burden.

Mr Haas argues that this is only the beginning. As AI workloads shift from training to inference, where models respond to user queries, demand for efficient, general-purpose processors should rise. Much of that work, Arm’s boss expects, will spread beyond data centres into phones, wearables and cars, again favouring CPUs.

Analysts expect revenue this fiscal year to be around $5bn, with half from royalties and the rest from licensing fees.

According to Visible Alpha, a data provider, last year Arm earned royalties of $0.86 per mobile chip, or 2.5-5% of the price.

To illustrate, Mr Haas uses an analogy. For most of its history, Arm sold designs for individual processors. Think of them as “Lego bricks”. Recently it has also started selling blueprints for pre-assembled blocks of processors known as “subsystems”.

One option is to develop custom chips for cloud providers. That has proved lucrative for Broadcom: making bespoke chips for Google and Amazon has helped push its market value above $1.6trn (Arm is worth $135bn). Some analysts think Arm will go further and design and sell its own chips. Rumours suggest that Meta, a social-media giant, will be the first customer.

Either route would bring Arm a bigger cut from its designs, but would entail risks. Creating finished chips, or moving in that direction, would undermine the claim that it does not compete with its customers.

SoftBank, the Japanese conglomerate that owns over 85% of the firm, has been assembling its own chip portfolio, buying Ampere, which makes server processors, and Graphcore, which designs AI chips. In August it bought 2% of Intel for $2bn. Masayoshi Son, SoftBank’s boss, is said to be keen to build an AI champion to rival Nvidia. Mr Haas, who sits on SoftBank’s board, talks up synergies across the group’s chip businesses. But all this may push Arm away from being a neutral supplier of designs.

The big test is whether the revenues of those pouring money into AI rise fast enough to justify the spending. At some point, “the math does need to square”.

A separate concern lies in China, source of a fifth of Arm’s revenue. China’s government is promoting RISC-V, an open-source chip architecture pitched as a domestic alternative to designs from Arm and Intel.

Mr Haas says his biggest worry is whether Arm is investing fast enough to take advantage of the AI opportunity. Chips take years to design and build; AI models evolve in months. Whether the company can move quickly enough is one question. Whether it can make the most of AI without undermining the model that put its designs everywhere is another.

2026-01-08 The AI frenzy is creating a big problem for consumer electronics

Excitement over the prospect of clever new devices powered by artificial intelligence is as strong as ever. Yet by gobbling up memory chips, which are essential for everything from smartphones and personal computers (PCs) to gaming consoles and cars, AI is creating a supply crunch for electronics-makers.

Jeffrey Clarke, chief operating officer of Dell, a manufacturer of computers, has called the situation “the most unprecedented mismatch in demand and supply” he has ever seen. Xiaomi, a Chinese smartphone-maker, has warned of delays and rising prices. Analysts predict that prices for PCs could jump by 15-20% in response. IDC, a data firm, reckons that if the situation persists, global smartphone shipments could fall by as much as 5% this year, and PC sales by roughly twice that.

Semiconductors are a cyclical business, prone to swing from surplus to shortage.

The rapid construction of data-centres has sent demand for HBM soaring. Producing it is resource-intensive: HBM requires three to four times as many silicon wafers as standard DRAM.

Supply is highly concentrated. Just three firms—SK Hynix and Samsung Electronics of South Korea, and Micron of America—rake in more than 90% of global DRAM revenue. All three are switching capacity to HBM, which will account for half of global DRAM revenue by the end of the decade, up from 8% in 2023, reckons Bloomberg Intelligence, a research group. HBM typically yields operating margins of 50% or more, compared with 35% for standard memory. Investors have rewarded the strategy. Since the start of 2025 the trio’s share prices have risen by an average of 200% (see chart 1).

But the flip side is that more basic memory chips, which account for 15-40% of the cost of smartphones and PCs, are becoming scarcer and costlier. The price for the DRAM found in most consumer electronics, known as DDR4, has risen by 1,360% since April 2025 (see chart 2).

The impact will be uneven. Apple, with its pricey i-gadgets and enormous scale, will be better placed to absorb higher costs and secure supply. Samsung will benefit from in-house memory production.

Asus, a Taiwanese PC-maker, raised prices for its laptops on January 5th. Xiaomi has said memory costs will have a “big impact” on margins. Carmakers may feel the strain most: as vehicles incorporate more electronics, the amount of DRAM per car is growing rapidly.

Relief will come slowly. Memory-makers plan to spend about $61bn on capital investment for DRAM this year, a 14% increase on 2025. But new capacity takes as long as two years to come online. Moreover, 60-70% of planned investment is earmarked for HBM, reckons Jukan Choi of Citrini Research, a firm of analysts. Chinese producers, which have become big suppliers of basic DRAM in recent years, are unlikely to plug the gap; they too are focusing on HBM. For now, only an unravelling of the AI boom would ease the shortage. Consumers may soon feel the pain.

2026-01-06 America’s missing manufacturing renaissance

Nearly a year on, however, the Trumpian manufacturing renaissance is conspicuous by its absence. The manufacturing contraction is now entering its third year, and factories have continued to shed jobs; employment fell by 0.6% in the year to November (see chart 1). And it is not just that Mr Trump’s actions are failing to revive American manufacturing. Under the hood, there are signs that they are actively hurting it.

Part of the problem is high interest rates. American industry fell into recession in early 2023, soon after the Federal Reserve sharply raised rates to combat inflation. Manufacturing, with expensive and often debt-financed kit, is especially sensitive to such changes. Mr Trump is keen to see looser monetary policy; America’s continuing high rates mostly reflect robust economic growth and vast rate-insensitive AI spending. All the same, his policies have not helped. High deficits and threats to the independence of the Fed have made American debt less desirable for investors, and thus lifted borrowing costs.

Moreover, his tariffs have injected uncertainty into the economy. For a manufacturing sector that sends nearly a quarter of its output abroad, this is a significant problem. Many inputs also come from abroad—think of industrial chemicals used in adhesive, coatings and plastics for cars or active pharmaceutical ingredients for medicines. Indeed, surveys suggest that export orders and import volumes for manufacturing have contracted markedly since Mr Trump announced high tariffs on “Liberation Day” in April, one that goes beyond the wider weakness in manufacturing (see chart 2). Factory bosses report difficulty making long-term plans.

Another way to see these costs is to look at the one sort of manufacturing that has been on a tear: computer equipment, especially semiconductors (see chart 3). Demand for chips has leapt owing to the data-centre boom. Notably, however, computer parts have also received exemptions from Mr Trump’s tariffs, points out Joseph Politano of Apricitas Economics, a newsletter. Semiconductors have been carved out from Mr Trump’s “reciprocal” tariffs on specific countries. More recently, the president has also watered down the export-control regime designed to deny China chips used to train the most sophisticated AI models. This rare free-trade turn seems to have provided a spur to the industry.

2024-10-24 Memory chips could be the next bottleneck for AI