The ROI in AI (and how to find it)

By 2030, companies will spend $42 billion a year on generative artificial intelligence (genAI) projects such as chatbots, research, writing, and summarization tools. And while the technology has been heralded as a boon to productivity, nailing down a return on investment (ROI) in genAI could prove to be elusive.

“Capturing and measuring the exact productivity improvements has been a challenge for many of our clients,” said Rita Sallam, a distinguished vice president analyst at Gartner. “For [genAI], we are not saying that finding ROI may be difficult, but expressing ROI has been difficult because many benefits like productivity…have indirect or non-financial impacts that create financial outcomes in the future."

For example, using genAI to automate code generation could make a software developer more productive, giving them additional time to improve productivity and increase innovation. Down the line, that could mean faster time to market for new features — and happier customers.

But how do you set a value on those intangibles?

“Ultimately, you may be able to use less skilled developers, so cost may go down and you can handle more work with the same number of developers,” Sallam said. “These benefits could ultimately lead to earlier revenue generation and possibly less customer and developer attrition and higher customer spend.

'"But the line of sight to cost reduction (without headcount reduction) is not direct.”

Last year was seen as the year of enterprise AI adoption, with 55% of organizations experimenting with genAI in workflows, according to an August report from consulting firm McKinsey & Co. At that time, however, fewer than a third of enterprises surveyed said they were using AI for more than one function, “suggesting that AI use remains limited in scope.”

This year, however, AI deployments are expected to ramp up quickly as projects reach “a critical mass of experience and proficiency.”

First, some bad news

That said, by 2025, 90% of enterprise deployments of genAI will slow as costs exceed value, according to Gartner — and 30% of those projects will be abandoned after proof of concept (POC) due to poor data quality, inadequate risk controls, escalating costs, or unclear business value.

By 2028, more than 50% of enterprises that have built large language models (LLMs) from scratch will abandon their efforts due to costs, complexity and technical debt in their deployments, according to Gartner research.

“Measuring ROI is hard,” said Bret Greenstein, Data & AI leader at professional services firm PriceWaterhouseCoopers (PwC). But by adapting an LLM to perform a function or process, it’s easier to compare its performance — cost, accuracy and speed — against earlier processes.

In the simplest of terms, ROI is a financial ratio of an investment’s gain or loss relative to its cost; so when a company invests in genAI, the benefits of that spending should outweigh costs.

“Once you get [genAI] to consistently achieve this new level of performance, you deploy it in production with the proper governance and operational processes and track its usage,” Greenstein said. “When you have a use case that saves two hours in a six-hour process, and track its usage, you can aggregate the savings."

Now, the good news

A large majority of business execs who are implementing or planning to roll out genAI tools expect or have realized benefits from their decisions, according to the Gartner Generative AI 2024 Planning survey of 822 business leaders. On average, survey respondents report:

15.8% revenue increase
15.2% cost savings, 4.6% through reduction in headcount
22.6% productivity improvement

ChatGPT has also been shown to improve worker productivity by 37%.

GenAI coding assistants can result in 7% to 55% worker productivity improvements
GenAI conversational assistants (chatbots) can improve customer service and support agents’ productivity. (Studies show a range of improvement between 14% and 35%.)

Seventy-three percent of US companies have already adopted AI in some areas of their business, according to PwC’s 2023 Emerging Technology Survey — with genAI leading the way. By 2027, spending is expected to reach $151.1 billion, representing a compound annual growth rate of 85.9% over the 2023-27 period, according to research firm IDC.

There's little debate about genAI’s impact on productivity, as it could add the annual equivalent of $2.6 trillion to $4.4 trillion globally, according to management consulting firm McKinsey & Co, which analyzed 63 use cases involving genAI.

About 75% of the value falls in four areas:

Customer operations;
Marketing and sales;
Software engineering;
R&D.

Across 16 business functions, McKinsey used cases in which genAI tools can address specific business challenges in ways that produce one or more measurable outcomes. Examples include its ability to support interactions with customers (chatbots), generate creative content for marketing and sales, and draft computer code based on natural-language prompts.

Banking, high tech, and life sciences are among the industries that could see the biggest impact as a percentage of their revenues from genAI, according to McKinsey.

Across the banking industry, for example, the technology could deliver added value worth between $200 billion and $340 billion annually, if the use cases were fully implemented. In retail and consumer packaged goods, the potential value could be between $400 billion and $660 billion a year.

Gartner

GenAI can also be more broadly useful for digital transformation efforts, according to McKinsey. Its ability to make sense of unstructured data, when combined with cloud, for example, can accelerate nearly any data-related transformation initiative. It can also help companies leap-frog several stages.

For instance, it can often handle complex tasks that were previously out of reach in finance, tax, legal and IT compliance, and other departments. It can, for instance, help a company more efficiently meet new Pillar II tax reporting requirements. More generally, it might soon eliminate the need to upgrade common enterprise applications. Instead, the apps could move them to the cloud, where customized genAI modules could help them continually evolve to meet changing business needs.

The vast majority of improvements will accrue to leading indicators of future financial value or indirect value, such as productivity, cycle time, customer experience, brand, quality, and faster upskilling of less-experienced workers, according to Gartner’s Sallam.

“Unless these benefits translate into immediate headcount reduction and other cost reduction, financial benefits accrue over time, depending on how the generated value is used,” she said.

In other words, genAI should enable organizations to do more with less — even as demand increases; use fewer senior workers; reduce the need for service providers; and improve customer and employee value, which could leads to higher retention.

“So, measure and value time saved for both those specific tasks and across aggregate tasks related to specific processes — within specific time periods,” Sallam said. “Productivity improvements alone may be a diminishing source of differentiation over time, but integrating these capabilities into other business processes can help enterprises maintain a competitive edge.

Productivity gains are the biggest initial benefits reported by early adopters, according to Gartner. But as those immediate gains diminish over time, companies will need to be patient as more efficient business processes save money over the long haul.

Quick wins and low hanging fruit

According to Gartner, calculating the value of new investments in genAI requires an organization to first build a business case by simulating potential costs and value across a range of activities. That means aiming for a mix of:

Quick wins
Differentiating use cases
Transformational initiatives

GenAI quick wins focus on potential productivity improvements, which today typically come from assistants such as Microsoft 365 Copilot and Google Workspace. Those kinds of activities are easy to get started, try out, and buy — but they are usually task-specific. The time to recognize value is typically less than a year.

Gartner

Differentiating use cases that leverage genAI in enterprise, domain, and industry applications or custom applications can give organizations a competitive advantage by improving specific business processes. These use cases can also leverage enterprise data in unique ways for competitive advantage, but they come with higher and more unpredictable costs and risk at scale, according to Gartner.

Transformational use cases are new products and services that could create completely new market categories and disrupt current ones. They also serve to retain customers by adding these capabilities to existing products (essentially creating new domain- and industry-specific genAI applications).

For example, an insurance company could fine-tune a large language model with its own policy documents to improve its performance on its specific use cases. Or, a financial services organization might create an LLM trained with financial data, which could then be used for many financial services use cases.

Broadly speaking, companies need to identify metrics that capture both financial benefits and strategic outcomes — such as a better user experience, broader access to capabilities previously requiring higher skills, and employee and customer satisfaction. Then companies can realistically assess their impact.

Param Vir Singh, a professor of Business Technologies at Carnegie Mellon University’s Tepper School of Business, said organizations shouldn’t be solely focused on financial returns as the measure for ROI — at least, for now.

“In a few years..., organizations will have a better idea of financial returns," Singh said. "Today, customer satisfaction is a very relevant area. In the past, that would tie back to profitability. But for now, you need to find out how much genAI is improving customer satisfaction and from there can you can calculate profitability.”

Another mistake some organizations make is viewing the ROI from genAI across all corporate initiatives instead of looking at each, individual project; the latter can highlight which initiatives work, and which don't.

“When AI is deployed in a specific place — say, employees’ access to Copilot to do a certain activity — then it’s easier to measure productivity gains,” Singh said.

One imperative in ensuring both ROI and transformative value from AI is to train employees on it. Workers need skills, guardrails, and incentives to use AI responsibly and effectively. If they don’t understand the value of genAI tools, they’re less likely to use them.

Lessons from GitHub

GitHub’s Chief Operating Officer, Kyle Daigle,GitHub launched GitHub Copilot about two and half years ago -- long before Copilot went public -- and that its making developers 55% more productive.

Oringially, GitHub thought it would be using Copilot for code documentation. But over time, the company discovered its could actually automate the production of a good percentage of code, alleviating mundane tasks.

“We’ve got over a million users using Copilot every day,” Daigle said. “Our stats show it’s making them about 55% more productive. It’s writing about 60% of code. We expect that to get up to 80% over time. I think most importantly…, it’s making developers feel more fulfilled and allowing them to do the creative work and not the toil work.”

In its early Copilot experimentation, GitHub was working mostly to help write in Python, Java Script and Ruby. Over time, it discovered that genAI tech could also assist writing code in publicly available languages as well as proprietary ones, Daigle said.

"So, we’ve gone from a couple of test languages to essentially every modern programming language,” he said.

In some cases, Copilot might just finish a line of code; in others, it can finish an entire method or file, he said. And, with Copilot’s chat feature developers can talk to the AI, describe a problem and it can generate an entire file that can then be tweaked until it suits the developer’s needs.

The vast majority of the code generated by Copilot is kept by developers, according to Daigle. After developing the code the process of continuous innovation and submitting a pull request also becomes easier because code tends to be more correct out of the gate.

“So, there are a lot of downstream impacts as well when you’re able to use Copilot as part of your workflow,” he said.

ROI, Daigle said, is baked into genAI code development because it reduces time to market, frees up developer time, and allows developers to focus more on creative tasks than menial chores.

Two kinds of genAI use cases

According to PwC, the best AI projects tend to fall in two categories: Simple use cases that require little effort to create and are used broadly, and high-volume efforts that can be costly, but scale up.

Simple use cases tend to involve summarizing information or using genAI tools to curate data and make it available to a lot of users. “These have emerged with the availability [of] custom [LLMs] that end users can do with small sets of data they have access to,” Greenstein said. “Policies, departmental datasets, personal knowledge are all commonly made available through custom [LLMs] in this way.”

High-volume use cases often handle tasks such as inbound applications, invoices, contracts, purchase orders and customer service requests. “Having generative AI handle these first, triage them, recommend answers and send the prioritized, summarized issues to people can drive significant savings in time and cost,” Greenstein said.

For example, genAI can help customer sales and service reps access data to help customers faster and with great personalization; that saves labor and time and can improve customer experience.

And with software and product development genAI can play a major role in saving time and costs from ideation, requirements, user stories, test cases, code generation, testing, and documentation.

But genAI tools cannot be set on autopilot under the assumption ROI will follow. Chon Tang, founding partner at the Berkeley SkyDeck Fund, an academic accelerator at the University of California-Berkeley, described genAI tools as more akin to humans — they have to be managed.

“Prompts have to be scrutinized, workflow verified, and final output double-checked. So, don't expect a system that automatically completes tasks,” he said. “Instead, generative AI in general, and LLMs in particular, should be seen as very low-cost members of your team.”

"Generative AI is enabling a level of machine intelligence we have never seen before: it will absolutely replace humans in many tasks, and the ROI will be obvious,” Tang said. “The only question is how many humans, and what kind of tasks.

"Today, generative AI remains unstable, unlike other pieces of technology that behave more like tools with very well-defined behavior. ...We wouldn't want to use a dishwasher that failed to wash our dishes 5% of the time.”