Summary
In the fourth installment of SafeBreach’s AI-First evolution series, VP of Development Yossi Attas and Principal Software Design Engineer Guy Ephraim explore how test-driven development (TDD) serves as the essential “safety net” for high-speed AI code generation. By shifting the developer’s focus from writing code to defining behavior through tests, SafeBreach prevents AI-driven technical debt and ensures that automated implementations remain strictly aligned with the Product Requirements Document (PRD). The discussion highlights how TDD tames the non-deterministic nature of AI, facilitating reliable feature development, bug fixes, and refactoring while emphasizing that human supervision remains critical to maintaining a clean, readable, and scalable architectural foundation.
In the first three parts of this series, we’ve documented the “how” and the “why” behind SafeBreach’s shift to an AI-First development organization. We started by redefining the engineer’s role as a validator rather than just a coder. We then pulled back the curtain on our operational workflow, showing how we’ve integrated specialized AI “skills” into our daily sprint cycles. Most recently, we explored why the Product Requirements Document (PRD) has made a massive comeback as the essential anchor for maintaining architectural integrity in our AI-first development organization.
But there is another “old-school” practice that is seeing an equally powerful revival in our AI-driven environment: Test-Driven Development (TDD). In a traditional human-only workflow, TDD is often praised but frequently skipped in the name of speed. However, when working with an AI agent that can generate code at lightning speed, having a “safety net” of tests isn’t just a best practice—it’s the best way we’ve found to move fast while ensuring the implementation actually matches the PRD.
To explore this topic I sat down with Guy Ephraim, Principal Software Design Engineer at SafeBreach. If you’ve been following this series, you probably noticed that I tend to be the optimist in the room when it comes to AI-First development. When an AI agent does something slightly strange, my instinct is often:
“Interesting… maybe that’s the future.”
Guy tends to have a slightly different reaction; something more along the lines of:
“Interesting… how fast can that destroy our codebase?”
Over time, Guy became one of the strongest advocates for using TDD as the safety mechanism that allows AI-First development to scale without turning into chaos.
In this fourth installment, I’ll dive into the details of our conversation, which explored why TDD matters more than ever in an AI-First world, how it actually works in practice, and what cultural shifts engineering teams must make to utilize this best practice.
Why TDD Matters Even More in an AI-First World
Yossi
Many leaders hear AI-First and think it means replacing code with prompts and letting the agent figure things out.
You, on the other hand, keep pushing test-driven development (TDD), which is probably one of the most disciplined engineering practices we have.
Why is TDD actually more important in an AI-First world? Or put differently—now that AI can write code, why are you asking developers to write even more tests?
Guy
TDD was always a good practice. The problem wasn’t the methodology. The problem was developers.
First, you need a clean design with proper interfaces. Second, you need to think about the system from the testing perspective before writing the implementation. And that’s where most developers lose patience. Writing tests first means delaying the fun part—writing code.
In the AI world, that dynamic changes. No task is too tedious for the AI. Writing tests becomes trivial. And because AI works much better when the design is clear, we already need a proper design before we start implementing anything.
So the old barriers to TDD mostly disappear. But that’s actually the secondary reason. The real value is that TDD helps the AI agent produce better code.
1. Tests Map the Design to Functionality
The PRD or design document remains the source of truth. Tests are the executable mapping between the design and the system’s behavior. If the tests correctly reflect the requirements, then satisfying those tests means the implementation is correct.
2. Test-Code Separation Creates Integrity Control
When tests are written first, the implementation must conform to them. If the AI writes tests and code simultaneously, you often get circular logic—the AI basically marking its own homework.
Writing tests first forces the implementation to satisfy externally defined behavior. It also ensures tests target real interfaces and system behavior, not internal implementation details like whether a variable was initialized in the constructor or in some helper function.
3. Tests Focus the AI on Behavior Instead of Internals
TDD encourages black-box thinking. Tests focus on the system’s flows and expected behavior rather than its internal structure. That naturally reduces hallucinations because the AI is guided by clearly defined outcomes instead of inventing implementation details.
Taming AI’s Non-Determinism
Yossi
Traditional software systems are deterministic: Input A leads to Output B. AI systems are not like that; they can drift or behave unpredictably. How does TDD help a company tame that non-determinism?
Guy
For me it’s all about checkpoints. Writing tests before implementation creates a moment where the developer can stop and ask:
“Is this actually what we want the system to do?”
Even just reading the test names often tells the whole story. The tests effectively become a table of contents for the implementation. If the test names don’t match the intent of the feature, you know immediately that something is off.
That checkpoint is extremely valuable. It prevents the AI from running ahead and building something impressive, but wrong.
How TDD Works in an AI-First Workflow
Yossi
A lot of developers today are doing what I call “prompting and praying.” They write a prompt, run the agent, and hope the output is correct. How does TDD actually change that workflow?
Guy
First we need to acknowledge reality. Like many mature products, SafeBreach has a large codebase with a lot of technical debt and historical bugs. That means a large portion of our work is not adding code, but rather fixing or improving existing code. In those situations hallucinations are especially dangerous.
So we categorize tasks into three types:
- Feature development
- Bug fixes
- Refactoring
Each of them benefits from TDD slightly differently.
Feature Development
Guy
For features, the flow is pretty straightforward:
PRD → Tests → Implementation → Code Review → Merge
The tests define the behavior. The AI implements code that satisfies those tests.
Bug Fixes
Yossi
Let’s talk about bugs. Historically that’s where developers spend hours chasing stack traces. Does AI actually help there?
Guy
Yes, a lot. Bug fixing was always about reproducing the issue. Once you reproduce the bug, the fix usually becomes obvious.
So the first step is writing a failing test that demonstrates the bug. Historically that step took most of the effort. AI just helps do it faster.
Once the failing test exists:
- The problem is clearly understood
- The fix becomes straightforward
- The test prevents the bug from ever returning
While this may sound somewhat trivial, this level of certainty in knowing we can reproduce the issue and that we have a proper fix is critical. It helps us avoid the common trap of “I think this is the issue and I think we have a solution for it.”
Refactoring
Yossi
You said earlier that refactoring is actually your favorite scenario. That surprised me.
Guy
Yes, because refactoring is where TDD shines the most. When we refactor a component, the first question we ask is:
“Do we have tests that describe the current behavior?”
Those tests must pass on both the old code and the refactored code without modification. That creates what I call a contract of consistency. If both versions produce the same output, we know the refactor is safe.
Also, because old code usually doesn’t have proper testing, creating tests that cover both old and new code forces the test quality to increase, adding even more value to the process.
The Mistake We Already Made
Yossi
Let me guess—we learned this the hard way.
Guy
Absolutely. At one point we had a brilliant idea:
“Let’s use AI to dramatically increase test coverage.”
So we generated thousands of tests. Coverage numbers looked amazing. Everyone was happy for about two weeks.
Then we realized something. Those tests didn’t validate the requirements. They validated the existing implementation. Which meant we couldn’t refactor anything without breaking hundreds of tests.
Instead of building a safety net, we built a straitjacket. That experience taught us something important: tests must validate behavior, not implementation details.
The Human Role in an AI-First World
Yossi
One thing you mentioned earlier stuck with me. Even if AI can write tests and code, developers still need to supervise the process. How does that change the role of engineers?
Guy
Honestly, we’re still figuring that out. Right now the senior engineers are the bottleneck. Junior developers can generate much more code with AI, but senior engineers spend a lot of time reviewing and guiding the output.
So the role shifts from writing code to ensuring the system evolves in a healthy direction. That supervision role is still something the industry is learning.
The Aha Moment: AI Needs Readable Code Too
One myth about LLMs is that they can magically understand messy code. Our experience suggests otherwise. AI struggles with spaghetti code just like humans do.
Readable code helps the AI reason about the system the same way it helps developers. And in the AI era, codebases will grow faster than ever. Which means if the code becomes unreadable, the AI will help you break it much faster.
Closing
TDD is often described as a discipline that slows developers down. In an AI-First world, the opposite may be true. By defining behavior first, we create the structure that allows AI agents to implement systems reliably.
- The PRD defines the intent.
- TDD verifies the behavior.
- AI generates the implementation.
And together they form the foundation of a development workflow where humans and AI can collaborate without losing control of the system.
Stay tuned for our next installment, where I’ll sit down with Andrew Kozma, who leads our technical account managers (TAM) team, and Brady Cotton, a TAM on Andrew’s team, to discuss how the transformation to AI-First does not stop at the development team. We’ll explore how it extends to every function that produces structured output and how, in many ways, the stakes get higher the closer you get to the customer.