Coherence Site 3.0

‍

In the fast-paced world of technology startups, the ability to pivot isn't just a buzzword—it's often the key to survival and eventual success. Our journey at Coherence exemplifies this truth, as we've evolved from our origins in DevOps automation to addressing one of the most pressing challenges in artificial intelligence today.

The bottom line is that we’re now revolutionizing how engineering teams build AI products by making comprehensive test datasets 10x faster to create. Our intelligent agent transforms the traditional approach to LLM testing, enabling teams to build more reliable AI systems with confidence.

How We Got Here

When we founded Coherence three years ago, we set out to democratize world-class developer platforms. Our vision was clear: create a DevOps automation tool that would give development teams access to enterprise-grade capabilities without the burden of building everything from scratch. The result was remarkable—we built a category-leading solution that streamlined development workflows and enhanced team productivity. Much of this innovation lives on in our open-source CNC framework, which continues to serve developers worldwide at cncframework.com.

However, the path of innovation rarely runs straight. As we gained traction and worked closely with customers, we encountered two significant market realities that would shape our future direction:

First, we discovered that our assumption about enterprise Kubernetes adoption was overly conservative. While we had carefully built our platform on managed cloud services like GCP Cloud Run and AWS ECS, gradually introducing Kubernetes capabilities, the market was already several steps ahead. Enterprise customers weren't looking for a gradual transition—they wanted deep, native Kubernetes integration from day one.

Second, our innovative pairing of Cloud IDE with preview environments, while technologically sophisticated, didn't resonate as strongly with customers as we had anticipated. Instead, they gravitated toward our cloud environment management and developer portal features, suggesting a different set of priorities than we had initially assumed.

Yet within these challenges lay an unexpected opportunity. Over three years, we had the privilege of engaging with hundreds of engineering teams, diving deep into their daily struggles and aspirations. These conversations went far beyond DevOps, touching every aspect of their engineering and product development processes. Through these discussions, a clear pattern emerged: while nearly every team aspired to build AI-powered products, they shared a common uncertainty about how to ensure effective and safe implementation of Large Language Models (LLMs). Our overall assessment of the landscape after a lot of conversations was that the “SDLC” for AI apps was still emerging, and that there’s a lot of opportunities in making the iteration loop of an AI engineer’s day to day development work easier.

The Current Landscape

Today's best practices for LLM implementation center around evaluations, or "evals"—essentially unit tests for prompt/model combinations. These evals serve as crucial quality gates, helping teams understand how their AI performs with known inputs and outputs. These datasets are often referred to as “ground truth” or “golden data.” However, the current approaches to creating these datasets face significant limitations.

The observability-based method requires teams to capture and curate actual execution examples, demanding substantial infrastructure investment and ongoing maintenance. While this approach provides real-world validation, it often creates a heavy operational burden that many teams struggle to establish and sustain.

Manual creation, even when assisted by ChatGPT or automation scripts, demands considerable time and expertise. Devs and product teams lean on subject matter experts to curate and audit the data. And then the data must be transformed into the right shape for eval tooling. Teams must meticulously craft examples that cover various edge cases and user scenarios, often leading to resource constraints and delayed development cycles.

The bottom line is that even if you have a lot of data, finding and arranging the right needles in all those haystacks is hard! If you don’t have enough data, or cannot use all the data that you’ve got, you’ve got a different problem in search of a similar solution as well.

The Cost of Inadequate Testing

Faced with these challenges, many teams resort to what we call "intuition-based development"—relying heavily on manual testing and reactive problem-solving. This approach creates a cascade of uncertainties: Are we using the optimal model for our use case? Could a smaller, more efficient model achieve similar results? Are we missing critical edge cases that could impact user experience? Without robust testing frameworks, these questions often remain unanswered, creating potential risks for both development teams and end-users.

Reimagining Golden Datasets

Drawing from our deep understanding of developer workflows and AI implementation challenges, we've developed an innovative third approach: an intelligent agent that generates comprehensive, realistic golden datasets tailored to your exact use cases. By providing the agent with relevant context—whether existing examples, domain information, or system prompts—teams can rapidly generate robust test datasets 10x faster than the traditional approaches they’re using today.

Our solution addresses fundamental aspects of LLM development:

Comprehensive prompt effectiveness evaluation across various scenarios
Data-driven model comparison and selection optimization
Systematic edge case identification and handling
Consistent performance assessment across diverse use cases
R&D use cases where different datasets and data shapes are being explored in rapid succession

Looking Ahead

Today's launch marks the beginning of a new chapter in what the “SDLC” for AI looks like. As the first data studio that offers control, power, and precision, we're setting out to let you both work faster and improve your quality standards.

Our goal isn't just to solve today's testing challenges—we're building the foundation for more reliable, trustworthy AI systems. By making realistic datasets more accessible and efficient, we're enabling teams to build AI-powered products with greater confidence and security.

We invite you to join us on this journey. Your feedback and experiences will be invaluable as we continue to enhance our platform and shape the future of AI development practices.

---

Want to learn more about how Coherence can transform your AI development process? Reach out to our team to get started.

‍

Coherence — A New Direction

How We Got Here

The Current Landscape

The Cost of Inadequate Testing

Reimagining Golden Datasets

Looking Ahead