Technology

Galileo’s Luna redefines GenAI evaluation, with 97% lower costs and 11x faster speeds

Published

4 weeks ago

June 6, 2024

Galileo's Luna redefines GenAI evaluation, with 97% lower costs and 11x faster speeds

VB Transform 2024 returns in July! More than 400 business leaders will gather in San Francisco July 9-11 to delve into the advancement of GenAI strategies and engage in thought-provoking community discussions. Find out how you can attend here.

Galileoa pioneer in generative AI for enterprises, has revealed Galileo Luna, a groundbreaking series of Evaluation Foundation Models (EFMs) that promises to transform the way companies evaluate their GenAI systems. With Luna, Galileo aims to address the critical speed, cost and accuracy challenges that have hindered the widespread adoption of generative AI in manufacturing environments.

“Galileo created Luna to address the limitations of current GenAI evaluation methods, which were slow, expensive and often inaccurate,” said Vikram Chatterji, co-founder and CEO of Galileo, in an interview with VentureBeat. “The motivation came from the need for ultra-low latency, cost-effective and highly accurate evaluations in production environments.”

The development of Luna marks an important milestone for Galileo, which has been at the forefront of Enterprise GenAI since its inception in early 2021. The company’s commitment to pushing the boundaries of AI evaluation is evident in the nearly year-long intensive R&D process that led to Luna’s creation.

Luna, Galileo’s groundbreaking suite of Evaluation Foundation Models, outperforms leading AI evaluation methodologies in a benchmark comparison of area under the receiver operating characteristic curve (AUROC) scores. The higher AUROC values, reaching 0.78, demonstrate Luna’s superior accuracy in assessing generative enterprise AI systems, surpassing competitors such as GPT-3.5, Trulens Groundedness and RAGAS Faithfulness. (Image credit: Galileo)

Purpose-built models redefine speed, cost and accuracy

At the heart of Luna’s innovation are purpose-built small language models, carefully tailored to specific evaluation tasks such as hallucination detection, context quality assessment, data leak prevention, and malicious prompt identification. This specialized design allows Luna to deliver unparalleled performance across three key metrics: speed, cost and accuracy.

VB Transform 2024 Registration is open

Join business leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with colleagues, explore the opportunities and challenges of generative AI, and learn how to integrate AI applications into your industry. register now

“Luna surpasses GPT-3.5 in speed, cost and accuracy thanks to several innovations,” Chatterji explains. “Luna uses purpose-built small language models tailored to specific evaluation tasks, significantly reducing computational overhead and costs. This design choice enables evaluations that are 97% cheaper and 11x faster than those performed with GPT-3.5.”

But it’s not just about speed and costs. Luna also features industry-leading accuracy, outperforming previous methods by up to 20% at detecting hallucinations, rapid injections, personally identifiable information (PII), and more. “Multi-headed small language models and advanced techniques such as intelligent chunking ensure that Luna models better preserve context and provide more accurate evaluations,” Chatterji added.

In a comparison of the monthly cost of evaluating 1 million searches, Galileo’s Luna significantly undercuts other methodologies, with a cost of just $175 per month. Luna’s purpose-built small language models enable assessments at a very low cost, making it up to 97% more cost-effective than alternatives such as GPT-3.5 at $6,248 per month, RAGAS Faithfulness at $7,994 per month and Trulens Groundedness at $16,641 per month. (Image credit: Galileo)

Revolutionizing evaluation without ground truth datasets

One of the most remarkable aspects of Luna is its ability to work without the need for traditional ground-truth datasets. By using pre-trained evaluation models tuned to diverse, domain-specific datasets, Luna eliminates the time-consuming and costly process of creating custom test sets. This innovation streamlines the evaluation process and reduces reliance on extensive human-generated data.

Luna’s potential applications are vast, with Chatterji highlighting its relevance in industries that demand high reliability and speed in AI assessments. “Luna is especially powerful in large-scale enterprise applications where volume and throughput are needed (i.e. millions of queries per month). We see that Fortune 100 companies in healthcare, finance and telecom find Luna particularly useful,” he said.

Galileo’s Luna delivers unparalleled speed in AI evaluation, with a latency of just 0.232 seconds to process a single query. This represents a significant improvement over other methodologies, such as GPT-3.5 at 2.5 seconds, Galileo Chainpoll at 3.0 seconds, Trulens Groundedness at 3.4 seconds and RAGAS Faithfulness at 5.4 seconds. Luna’s purpose-built small language models enable ultra-low latency evaluations, making them up to 11 times faster than competing approaches. (Image credit: Galileo)

Customization and continuous evolution in light of rapid GenAI developments

Use cases range from real-time monitoring of AI outputs and detecting hallucinations in AI-generated content to ensuring the safety and quality of chatbot interactions. And with Galileo’s Fine Tune product, Luna can be tailored to specific customer requirements, achieving accuracy levels of 95% or higher for critical tasks in industries such as pharmaceuticals and financial services.

As the generative AI landscape continues to rapidly evolve, Galileo remains committed to staying at the forefront of innovation. Chatterji emphasized that Luna will scale in three key ways: expanding support for more types of evaluation tasks, continuously improving accuracy, and further reducing costs and latency.

“Galileo aims to push the boundaries of what is possible in AI evaluation and help organizations bring reliable AI to production,” said Chatterji. “As the landscape of generative AI continues to evolve, Galileo remains committed to providing its customers with advanced assessment capabilities that make AI practical for businesses to deploy and build consumer trust.”

With the launch of Luna, Galileo has solidified its position as a leader in GenAI enterprise assessment. As more organizations seek to harness the power of generative AI, Luna’s ability to deliver rapid, cost-effective and accurate assessments will be a critical factor in driving widespread adoption and unlocking the full potential of this transformative technology.