Image

Announcing Our $25M Series A

We founded David AI less than a year ago to build the data layer for audio AI. Today, we’re a trusted partner to many of the world’s leading AI labs, and our data supports cutting-edge voice production systems and frontier research.

We’re excited to announce our $25 million Series A, led by Alt Capital and Amplify Partners, with participation from First Round Capital, Y Combinator, BoxGroup, and other great investors. We’re thrilled to welcome a group of angel investors who collectively bring decades of experience in frontier audio research, and to welcome Jack Altman to our board.

Our Mission

At David AI, our mission is to bring AI into the real world—and we believe voice is how that will happen.

Today, we’re seeing AI phone agents take hold (e.g., in customer support), but the scope of voice AI’s impact will be much larger.

Many of the most promising real-world AI use cases rely on audio as an interface. Think humanoid robots, wearable devices, and everyday tools and assistants embedded into your daily life. All of these depend on voice to meaningfully interact with.

The Audio Data Problem

As audio AI advances and new use cases emerge, high-quality training data is the bottleneck.

Voice AI apps are only as good as the models they rely on—and those models are only as good as the data they’re trained on.

For example, many labs are investing in end-to-end speech model architectures. A 2024 paper from Meta AI emphasized the need for millions of hours of full-duplex, channel-separated speech to feed’ these models. Yet across all publicly accessible datasets, only a few thousand hours exist in the right format:

“Compared to text-based chat datasets, spoken dialogue data is limited. A combination of all significant spoken dialogue datasets… would still result in only ∼3k hours.”
– “Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents,” Meta AI, 2024

This is where David AI comes in.

Building a Data Research Lab

David AI is an audio data research company. We build audio datasets with the same rigor that researchers apply to model development. That means two things:

One, we take a research-driven approach to identifying what datasets to collect, evaluating and iterating on those datasets—not just for ‘data quality’, but also for efficacy in model training—and scaling them with extreme attention to detail.

Two, our company is focused on audio. This enables us to invest deeply in audio products, infrastructure, operations, and models which, in turn, allows us to build the best audio datasets in the world.

What’s Next

In under a year, we’ve grown to support most of the “Mag 7” companies and nearly every leading audio AI lab. We also recently crossed eight figures in annual revenue run rate.

With this Series A, we’re growing the team. If you're excited about our mission and the hard problems we’re solving in audio, we’d love to hear from you. We’re hiring across research, product, engineering, and operations.

Apply at here or reach out directly: tomer [at] withdavid [dot] ai.

Image

Announcing our $5M Seed Round Led by First Round

We started David AI six months ago to build the data layer for audio AI. We believe audio will become as central to human-to-AI interaction as it is to human-to-human interaction, but in order to get there, model developers need access to much more high-quality audio data than they have today.

Today, we’re excited to announce our $5M seed round, led by First Round Capital with participation from BoxGroup, Y Combinator, SV Angel, Liquid 2, and an awesome set of angels.

To achieve their potential, audio models need better data.

Audio models need substantially more training data to improve reasoning performance, naturalness, and robustness. That data has historically been hard to come by.

High-quality audio data is fragmented – there's no Common Crawl for audio. It's scarce in the right formats – for example, until now, the most-cited multi-channel speech datasets in research are dated and only hundreds of hours in duration. It’s also hard to generate new audio – you need to ensure content accuracy as with text, while also accounting for acoustic properties, microphones and recording environments, languages, and localizations.

Speaking to researchers, we realized there was an opportunity to take audio data collection off their plates, so we built a product and operation set up for 1,000x scale.

David AI is the first audio-native AI data platform.

In 2025, audio AI will have its ‘ChatGPT moment’. Our mission is to accelerate this by helping our customers bring better audio models to market, faster.

We’re building the infrastructure to collect studio-grade audio data at an unprecedented scale across every language and geography – exponentially expanding the breadth of available audio data, while preserving the sound quality nuances that make or break a model. This requires novel software, hardware, and operations built specifically for audio.

Since founding David AI, we’ve collected the largest corpus of channel-separated speech data on the market. The dataset is 10x the next largest one and spans ~15 languages, with rich accent and dialect metadata. Our data has already been used to train several of the best speech models on the market.

Join us.

We’re a lean team that met while working at Scale AI, and we obsess over execution. In six months, we’ve exceeded 7-figures in revenue, partnering with leading AI labs from FAANG companies to startups.

If you’re excited about audio AI and driving measurable impact for the best AI companies in the world, join us. We’re hiring founding engineers and operators – when there’s a fit, we move quickly.

Apply here or reach out at tomer@withdavid.ai.