Weights & Biases weaves new LLMOps capabilities for AI development and model monitoring

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

San Francisco startup Weights & Biases is expanding its platform today with the release of a pair of new capabilities designed to help make it easier for organizations to build and monitor machine learning (ML) models.

Making LLMOps easier

Weights & Biases’ platform includes tools that help enable an AI/ML development lifecycle. At the end of April, the company added new tools to enable LLMOps, that is, workflow operations for supporting and developing large language models (LLMs). The new additions announced today, W&B Weave and W&B Production Monitoring, aim to help organizations more easily get AI models running effectively for production workloads.

Though Weave is only being officially announced today, early iterations have been a core part of how Weights & Biases has been building out its overall platform to provide a toolkit for AI development visualization.

“[Weave] is a very big piece of our roadmap, it’s something that I’ve personally been working on for two and a half years now,” Shawn Lewis, Weights & Biases CTO and cofounder, told VentureBeat. “It’s foundational, so there’s a lot that you can do on top of this; it’s a tool for customizing your tools to your problem domain.”

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

AI isn’t just about models, it’s about visualizing how to use them

Lewis explained that Weave was originally conceived as a tool for understanding models and data in the context of a visual, iterative user interface (UI) experience.

He described Weave as a toolkit containing composable UI primitives that a developer can put together to make an AI application. Weave is also about user experience; it can help data scientists develop interactive data visualizations.

“Weave is a toolkit for composing UIs together, hopefully in a way that’s extremely intuitive to our users and software engineers working with LLMs,” Lewis said. “It helps us internally bring tools to market really fast, as we can make visual experiences on new data types really easily.”

In fact, Weave is the tool that Weights & Biases used internally to develop the Prompts tools that were announced in April. It is the foundation that enables the new production monitoring tools as well.

W&B Weave uses state-of-the-art techniques and visualizations, making it easy for developers to explore data, evaluate models and experiment with ML building blocks seamlessly. Image credit: Weights & Biases

Weave is being made freely available as an open-source LLMOps tool, so anyone can use it to help build AI tools. It is also integrated into the Weights & Biases platform so that enterprise customers can build visualizations as a part of their overall AI development workflow.

Building a model is one thing, monitoring it quite another

Building and deploying an ML model isn’t the only part of the AI lifecycle. Monitoring it is crucial too. That’s where the Weights & Biases’ production monitoring service fits in.

Lewis explained that the production monitoring service is customizable to help organizations track the metrics that matter to them. Common metrics for any production system are typically about availability, latency and performance. With LLMs there are also a host of new metrics that organizations need to track. Given that many organizations will use a third-party LLM that will charge based on usage, it’s important to track how many API calls are being made, to manage costs.

With non-LLM AI deployments, the issue of model drift is a common monitoring concern, where organizations track to identify unexpected deviations over time from a baseline. With an LLM — that is, using generative AI — model drift cannot be easily tracked, Lewis said.

For a generative AI model used to help write better articles, for example, there would not be one single measurement or number that an organization could use to identify drift or quality, Lewis said.

That’s where the customizable nature of production monitoring comes in. In the article-writing example, an organization could choose to monitor how many AI-generated suggestions a user actually integrates and how much time it takes to get the best result.

Production monitoring enables real-time metrics with the most relevant visualizations and flexible, dynamic querying for an organization’s particular use case. Image credit: Weights & Biases

Monitoring can potentially be used to help with AI hallucination. An increasingly common approach to limiting hallucination is with retrieval-augmented generation (RAG). These techniques provide the sources for a specific piece of generated content. Lewis said that an organization could use production monitoring to come up with a visualization in the monitoring dashboard to help get more insights.

“Maybe it won’t tell you definitively that hallucination happened, but it’ll at least give you all the information you need to look at it, and form your own kind of human understanding of whether that happened,” he said.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.