How AI-based search assistant Glean Chat is built to boost productivity

Glean Chat, a generative AI application that builds on the Palo Alto-based startup’s knowledge graph and enterprise search, can connect to enterprise applications to provide conversational search experience, along with notes about sources.

Senior Writer, Computerworld |

employees technology planning data [Computerworld, January-February 2017 - HR IT] — Thinkstock

Palo Alto-based startup Glean, founded in 2019 by former Google, Microsoft and Meta employees, has released a new generative-AI based assistant, dubbed Glean Chat, designed to boost productivity and efficiency across enterprises via a conversational search interface.

Defining Glean Chat — an add-on to the company's namesake enterprise search product —as the “Power BI of unstructured data,” CEO and founder Arvind Jain said that the generative AI assistant is targeted at helping employees find information across an enterprise’s applications and content repositories quickly and efficiently, with source citations.

Glean Chat offers an experience very similar to OpenAI’s ChatGPT, but limited to an enterprise’s content and resource boundaries, Jain said. When a user makes a natural language-based query, the company’s search technology uses APIs to check all the content and activity — including information in applications — pertaining to the query before storing it in a customer’s cloud environment. The data stored is then fed to large language models (LLMs), which have been trained on that particular enterprise’s data, to generate the search or query result.

The query result contains links to source information from documents, conversations and applications.

How Glean Chat is structured

Glean is built on five layers consisting of infrastructure, connectors, a governance engine, the company’s knowledge graph, and an adaptive AI layer, according to the company.

In order to connect to an enterprise’s applications and content repositories, Glean Chat uses its self-developed connectors to link to applications and data sources such as Salesforce, Zendesk, Jira, GitHub, Slack, Figma, Workday, Okta, Outlook, OneDrive, Google Drive, Box, Dropbox, SharePoint, as well as storage offerings from AWS, Google Cloud, and Microsoft among others.

The governance layer ensures that the generative AI follows an enterprise’s set boundaries and security policies such as identity and access management, the company said.

The knowledge graph layer, which the company has developed over the last few years, understands relationships between content and employees and internal language in an enterprise, Jain said, adding that “this enables Glean to recognize nuances like how people collaborate, how each piece of information relates to another, and what information is most relevant to each user.”

The knowledge graph layer is trained on an enterprise’s data along with large language models once it becomes a Glean subscriber, according to Jain.

The adaptive AI layer uses the information from the knowledge graph and runs it through LLM embeddings for semantic understanding and large language models for generative AI, the company said. LLM embeddings are vectors or arrays that are used to give context to artificial intelligence models, a process known as grounding. This process allows enterprises to avoid having to fully train or finetune AI models using the enterprise information corpus, said Bradley Shimmin, chief analyst at Omdia.

Currently, Glean is using a mix of large language models including OpenAI’s GPT-4 and transformer models from Google, such as BERT.

Can Glean Chat carve its space in a crowded market?

Glean Chat, however, faces an uphill task when it comes to carving a space in a crowded generative AI market, as there are many competitors with similar offerings, according to Constellation Research principal analyst Andy Thurai.

“They will face a visibility and survivability problem in the near future,” Thurai said.

But Glean’s approach to train large language models on specific enterprise data can bring some value to any enterprise that is looking for a searchable knowledge repository between structured and unstructured data.

“Glean adds more value by being able to do that from within native applications as well as the ability to search into application data - such as Gong, etc,” Thurai said, adding that other potential differentiators for Glean Chat includes the application connectors and its “strict” permissions control and governance tools integration that gives access to data based on the user or employee profile.

Can enterprise users trust Glean Chat’s results?

When asked whether enterprise users could trust results from Glean Chat, IDC research manager Hayley Sutherland said that companies should provide methods for understanding and explaining the results or recommendations generated by assistants like Glean Chat

“This will not only help to ensure trust in the product but will also help supervisors and others to leverage these analytics to understand the potential sources of and remediations for issues. This is especially important for solutions that leverage LLMs like GPT-4 that have experienced known hallucination and other accuracy issues,” Sutherland said.

However, the process of grounding should be able to provide accuracy for results generated by Glean Chat, Omdia’s Shimmin said.

“With semantic search using vector databases, we have a comparatively high assurance of semantic accuracy, meaning if we search for ‘2023 days off, the search results will understand that we are looking for a calendar of official company holidays for 2023," Shimmin said.

The company said it has plans to introduce more granular citations for search results soon.

Pricing and availability for Glean

Glean Chat, according to Jain, will be priced on a per-seat basis and as a premium add-on to Glean’s core search product. Currently, the new generative AI-based assistant is in early access for all Glean customers and will soon be made generally available, Jain added.

Glean, according to Amalgam Insights’ principal analyst Hyoun Park, competes with the likes of Neeva, which was acquired by Snowflake.

The company, which has raised about $155 million to date from investors such as Sequoia, Lightspeed, Slack Fund, General Catalyst, and Kleiner Perkins, claims that it already has over 100 enterprise customers including the likes of Databricks, Vanta, Plaid, Grammarly, Plaid, Okta, Samsara, Niantic, Greenhouse, Duolingo, Wealthsimple, and Confluent.

Company founders include Jain, who was a Google Distinguished Engineer and co-founder of Rubrik, as well as T.R. Vishwanath (formerly of Microsoft and Meta); Piyush Prahladka (Google, Uber); and Tony Gentilcore (Google)

Anirban Ghoshal is a senior writer, covering enterprise software for CIO and databases and cloud infrastructure for InfoWorld.

It’s time to break the ChatGPT habit