Chip industry strains to meet AI-fueled demands — will smaller LLMs help?

Large language models are growing in size, as is the number of companies using generative AI technology. The more the models grow, the more CPUs they consume — exacerbating an already-strained chip supply chain.

Senior Reporter, Computerworld |

shutterstockim imagery — Shutterstock/IM-imagery

Generative artificial intelligence (AI) in the form of natural-language processing technology has taken the world by storm, with organizations large and small rushing to pilot it in a bid to automate tasks and increase production.

Tech giants Google, Microsoft, and Amazon are all offering cloud-based genAI technologies or baking them into their business apps for users, with global spending on AI by companies expected to reach $301 billion by 2026, according to IDC.

But genAI tools consume a lot of computational resources, primarily for training up the large language models (LLMs) that underpin the likes of OpenAI’s ChatGPT and Google’s Bard. As the use of genAI increases, so too does the strain on the hardware used to run those models, which are the information storehouses for natural language processing.

Graphics processing units (GPUs), which are created by connecting together different chips — such as processor and memory chips — into a single package, have become the foundation of AI platforms because they offer the bandwidth needed to train and deploy LLMs. But AI chip manufacturers can't keep up with demand. As a result, black markets for AI GPUs have emerged in recent months.

Some blame the shortage on companies such as Nvidia, which has cornered the market on GPU production and has a stranglehold on supplies. Before the rise of AI, Nvidia designed and produced high-end processors that helped create sophisticated graphics in video games — the kind of specialized processing that is now highly applicable to machine learning and AI.

AI's thirst for GPUs

In 2018, OpenAI released an analysis showing since 2012, the amount of computing power used in the largest AI training runs had been increasing exponentially, doubling every 3.4 months (By comparison, Moore’s Law posited that the number of transistors in an integrated circuit doubles every two years).

“Since 2012, this metric has grown by more than 300,000x (a 2-year doubling period would yield only a 7x increase),” OpenAI said in its report. “Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.”

There’s no reason to believe OpenAI’s thesis has changed; in fact, with the introduction of ChatGPT last November, demand soared, according to Jay Shah, a researcher with the Institute of Electrical and Electronics Engineers (IEEE). “We are currently seeing a huge surge in hardware demands — mainly GPUs — from big tech companies to train and test different AI models to improve user experience and add new features to their existing products,” he said.

At times, LLM creators such as OpenAI and Amazon appear to be in a battle to claim who can build the largest model. Some now exceed 1 trillion parameters in size, meaning they require even more processing power to train and run.

“I don't think making models even bigger would move the field forward," Shah said. "Even at this stage, training these models remains extremely computationally expensive, costing money and creating bigger carbon footprints on climate. Additionally, the research community thrives when others can access, train, test, and validate these models.”

Most universities and research institutions can’t afford to replicate and improve on already-massive LLMs, so they’re focused on finding efficient techniques that use less hardware and time to train and deploy AI models, according to Shah. Techniques such as self-supervised learning, transfer learning, zero-shot learning, and foundation models have shown promising results, he said.

“I would expect one-to-two years more for the AI research community to find a viable solution,” he said.

Start-ups to the rescue?

US-based AI-chip start-ups such as Graphcore, Kneron and iDEAL Semiconductor see themselves as alternatives to industry stalwarts like Nvidia. Graphcore, for example, is proposing a new type of processor called an intelligent processing unit (IPU), which the company said was designed from the ground up to handle AI computing needs. Kneron's chips are designed for edge AI applications, such as electric vehicles (EVs) or smart buildings.

In May, iDEAL Semiconductor launched a new silicon-based architecture called "SuperQ," which it claims can produce higher efficiency and higher voltage performance in semiconductor devices such as diodes, metal-oxide-semiconductor field-effect transistors (MOSFETs), and integrated circuits.

While the semiconductor supply chain is very complex, the fabrication part has the longest lead time for bringing new capacity online, according to Mike Burns, co-founder and president at iDEAL Semiconductor.

"While running a fab at high utilization can be very profitable, running it at low utilization can be a financial disaster due to the extreme [capital expenses] associated with production equipment," Burns said. "For these reasons, fabs are careful about capacity expansion. Various shocks to the supply chain including COVID, geopolitics, and shifts in the types of chips needed in the case of EVs and AI, have produced several constraints that may take one to three years to correct. Constraints can occur at any level, including raw materials caught in geopolitics or manufacturing capacity awaiting build-out."

While video games remain a big business for Nvidia, its emerging AI business has allowed the company to control more than 80% of the AI chip market. Despite formidable jumps in Nvidia’s revenues, however, analysts see potential issues with its supply chain. The company designs its own chips but — like much of the semiconductor industry — it relies on TSMC to produce them, making Nvidia susceptible to supply chain disruptions.

In addition, open-source efforts have enabled the development of a myriad of AI language models, so small companies and AI startups are also jumping in to develop product-specific LLMs. And with privacy concerns about AI inadvertently sharing sensitive information, many companies are also investing in products that can run small AI models locally (known as Edge AI).

It's called “edge” because AI computation happens closer to the user at the edge of the network where the data is located — such as on a lone server or even in a smart car — as opposed to a centrally located LLM in a cloud or private data center.

Edge AI has helped radiologists identify pathologies, controlled office buildings through Internet of Things (IoT) devices and been used to control self-driving cars. The edge AI market was valued at $12 billion in 2021 and is expected to reach $107.47 billion by 2029.

“We will see more products capable of running AI locally, increasing demand for hardware further,” Shah said.

Are smaller LLMs the answer?

Avivah Litan, a distinguished vice president analyst at research firm Gartner, said sooner or later the scaling of GPU chips will fail to keep up with growth in AI model sizes. “So, continuing to make models bigger and bigger is not a viable option,” she said.

iDEAL Semiconductor's Burns agreed, saying, "There will be a need to develop more efficient LLMs and AI solutions, but additional GPU production is an unavoidable part of this equation."

"We must also focus on energy needs," he said. "There is a need to keep up in terms of both hardware and data center energy demand. Training an LLM can represent a significant carbon footprint. So we need to see improvements in GPU production, but also in the memory and power semiconductors that must be used to design the AI server that utilizes the GPU."

Earlier this month, the world’s largest chipmaker, TSMC, admitted it's facing manufacturing constraints and limited availability of GPUs for AI and HPC applications. “We currently can't fulfill all of our customer demands, but we're working towards addressing roughly 80% of them,” Liu said at Semicon Taiwan. “This is viewed as a transient phase. We anticipate alleviation after the growth of our advanced chip packaging capacity, roughly in one and a half years."

In 2021, the decline in domestic chip production underscored a worldwide supply chain crisis that led to calls for reshoring manufacturing to the US. With the US government spurring them on through the CHIPS Act, the likes of Intel, Samsung, Micron, and TSMC unveiled plans for several new US plants. (Qualcomm, in partnership with GlobalFoundries, also plans to invest $4.2 billion to double chip production in its Malta, NY facility.)

TSMC plans to spend from as much as $36 billion this year to ramp up chip production, even as other companies — both integrated device manufacturers (IDM) and foundries — are operating close to or at full utilization, according to global management consulting firm McKinsey & Co.

“The chip industry cannot keep up. GPU innovation is moving slower than the widening and growth of model sizes,” Litan said. “Hardware is always slower to change than software.”

TSMC’s Liu, however, said AI chip supply constraints are "temporary" and could be alleviated by the end of 2024, according to a report in Nikkei Asia.

Both the US CHIPS and Science Act and European Chips Act were meant to address supply-and-demand challenges by bringing back and increasing chip production on their own shores. Even so, more than a year after the passage of the CHIPS Act, TMSC has pushed back the opening date for its Phoenix, AZ Foundry – a plant touted by US President Joseph R. Biden Jr. as the centerpiece of his $52.7 billion chips repatriation agenda. TSMC had planned on a 2024 opening; it’s now going online in 2025 because of a lack of skilled labor. A second TSMC plant is still scheduled to open in 2026.

The world’s largest supplier of silicon carbide, Wolfspeed, recently admitted it will likely be the latter half of the decade before CHIPS Act-related investments will affect the supply chain.

iDEAL Semiconductor's Burns said the US and European Chips acts should help address the supply chain issue by reshoring some parts of the semiconductor industry to increase resiliencey in the manufacturing system.

"The US CHIPS and Science Act has already impacted the sector by raising semiconductor supply chain risk to a national dialog. The attention now focused on supply chain risks has propelled investments by the private sector," Burns said. "US manufacturers have announced plans to expand their capacities, and investments in places like Texas, Ohio, New York and Arizona are fast under way. It will take time to fully evaluate the extent to which the CHIPS and Science Act can resolve existing supply chain issues, but it is a good first step in expanding domestic manufacturing capacity."

Despite the AI chip shortage, however, AI chip stocks have soared, including Nvidia’s, whose market capitalization passed the trillion-dollar mark as its stock price more than tripled in the last 52 weeks.

The IEEE’s Shah also noted that the US government has not been able to provide the funds it promised to foundries, which by default means many US-based tech companies must plan on relying on existing manufacturers.

“I personally believe it would still take four to five years to have hardware manufactured on US soil that is also cheaper than Asian counterparts,” Shah said.

Senior Reporter Lucas Mearian covers AI in the enterprise, Future of Work issues, healthcare IT and FinTech.

It’s time to break the ChatGPT habit