Automated Doubt: Rebuilding Trust in AI-Assisted Development

As developers, we embrace new tools that promise to accelerate our work. AI-assisted development, leveraging powerful Large Language Models (LLMs), quickly became a game-changer. However, many of us, myself included, have hit a wall: a profound loss of trust. Early on, I allowed LLMs to operate with too much autonomy, too quickly, without the foundational engineering rigor I'd spent years internalizing. This led to unreliable artifacts and a creeping sense of doubt.

My solution to this challenge, and how I've since rebuilt confidence in AI-generated code, is what I call the "Automated Doubt Development Process." It's an approach centered on front-loading scrutiny and systematically critiquing every artifact through multi-agent validation. Essentially, I automate as much doubt as possible, repeatedly. If you're using AI for code, specifications, documentation, or any other artifact, this process can significantly enhance quality and reliability.

The Core Principle: Multi-Agent Scrutiny

The heart of this process lies in the extensive use of specialized subagents. These aren't just generic LLM instances; they're tailored to audit specific 'perspectival surfaces' that a single, broad LLM might overlook. Think of it like gaining depth perception: just as two eyes provide parallax to see in 3D, multiple agent perspectives catch different types of defects. The more diverse the vantage points, the better the defect detection, and the earlier in the process that scrutiny is applied, the more effective it becomes.

My workflow is divided into three distinct phases: Design, Development, and Wrap-up/Ship.

Phase 1: Design – Front-Loading Scrutiny

Every project begins with an idea and a specification. I still advocate starting with a clear spec, PRD, or plan. My first step is to ask the primary LLM (e.g., Claude) to draft this specification. I then spend a brief 2-5 minutes reviewing it, ensuring the core concepts are captured. This is where the iterative doubt process truly begins.

I initiate a "Pre-implementation workflow," engaging a trio of agents for the initial round of critique:

Pre-Implementation Architect: Verifies design quality and scope assessment.
Documentation Validator: Identifies gaps in planned documentation.
Assumption Excavator: Unearths hidden assumptions within the spec. This agent is near universally applicable and highly recommended for any artifact.

These agents generate findings—typically 10-25 for an average scope—which the main terminal agent then folds back into the spec. For instance, the Assumption Excavator might reveal: "executionStatsSchema in registry-sdk returns {totalCount, recentCount, windowMinutes}. Spec assumes {avgScore, medianDurationMs, passRate, lastRunDate, lastRunScore}. Entire history section unbuildable without new API endpoint." Or the Pre-Implementation Architect might suggest: "HarnessProfile embeds mcp.read/merge/remove/write methods alongside path config. Consider extracting McpConfigStrategy to separate concerns. Each harness file will grow to 80–120 lines otherwise."

For more complex projects, I deploy additional agents in subsequent iterations:

Gap Analyzer: Excellent at finding omitted system aspects.
Implied Completeness Detector: Uncovers undefined behaviors or missing versioning strategies.
Ambiguity Mapper: Clarifies unclear sections.

After these iterative rounds, I dedicate 15-60 minutes to thoroughly read the refined spec. Once satisfied, I instruct the LLM to generate a companion checklist, which proves invaluable for tracking progress, especially across multiple development sessions.

Phase 2: Development – Iterative Validation

With the spec and checklist ready, the LLM begins development. If I'm resuming work, I might first use agents like Chain Tracer or Deep Explore to get a comprehensive understanding of the current state.

One crucial aspect of my current process is a deliberate limitation: I do not use subagents for writing code. My experience with autonomous subagents causing more issues than they solved led me to draw a temporary line in the sand. While I acknowledge that advanced swarm orchestration methods exist, for now, I prefer a single LLM instance handling the core build, only occasionally spawning subagents for bulk updates. This allows me to maintain a higher degree of control and trust.

Once the build is complete, the "Automated Doubt" truly shines. I run a "Post-Implementation workflow" comprising:

Code Validator: Audits general code quality.
Type Safety Validator: Ensures type correctness.
Test Architect: Reviews testing quality and coverage.
Code Optimizer: Flags performance considerations and duplication.
Public Interface Validator: Checks the external API consistency and documentation.
Security Analyst: Assesses the codebase's security posture.

The first run typically unearths 15-35 findings, with a significant portion flagged as critical. I address these issues, then re-run the workflow, tackling subsequent sets of findings until the codebase meets my internal quality standards. For example, the Security Analyst might find: "PreflightError includes shellQuote-expanded target path verbatim. Error messages containing resolved filesystem paths may propagate to tracking API and dashboard."

Phase 3: Wrap-up and Ship – Finalizing for Release

Once I'm confident in the project's practical functionality and quality, I move to the final "Ship" workflow. This is the ultimate convergence of iterated doubt, ensuring release readiness. It includes many of the post-implementation agents, plus specialized checkers:

Code Auditor: Performs a deeper code quality inspection.
Anxiety Reader: Identifies potential failure points or resource exhaustion risks, e.g., "Promise.allSettled fires all agents simultaneously with no concurrency limit, risking resource exhaustion and API rate limits."
API Contract Validator: (If applicable) Ensures API consistency.
Release Readiness Validator: Provides a final check on the system's overall release posture.

This phase often requires 2+ iterations, as the aim is to ensure nothing slips through before deployment. The goal is to reach a state where the question, "Is this ready for release?" can be answered with a resounding yes.

Practicalities and Perspective

Ultimately, this process is a negotiation between the artifacts, the agents, and the human operator, converging on a shared understanding of quality. While quality can be subjective, it's driven by objective goals like consistency, usability, readability, and maintainability. The decision of when to stop iterating—when the "juice is worth the squeeze" for the next fix—comes down to intuition, patience, practice, judgment, and expertise. The engineer, much like the artist, can always find more to refine, but versioning allows for continuous improvement.

One significant consideration is the token cost. This process, by its nature, is not cheap. For smaller projects, it might be overkill, while for some complex systems, it may still require additional custom agents. However, my personal inclination is to apply this rigorous process repeatedly to ensure that AI-developed code adheres to a higher standard of verification and validation.

This journey, born from a lack of trust, has evolved into a robust trust signal, allowing me to leverage AI's power with confidence.

FAQ

Q: Why not allow subagents to write code? What's the main concern?

A: My primary concern stems from early experiences where subagents, operating autonomously for writes, often introduced more harm than good, creating complex issues that were harder to debug. This led to a temporary policy of centralizing code generation through a single terminal agent to maintain control and oversight, even while acknowledging advanced swarm orchestration techniques exist.

Q: How do you decide which agents to use for a given project scope?

A: The process scales with complexity. For small scopes, only the Pre-implementation agents might be sufficient. Medium scopes benefit from adding Gap Analyzer, Implied Completeness Detector, and Ambiguity Mapper. Large, critical projects warrant a full sweep, potentially multiple runs with each agent, and sometimes dipping into other highly specialized agents as needed. The universal recommendation is the Assumption Excavator, regardless of scope.

Q: What's the philosophical underpinning of terminating the iteration loop, given that quality is subjective?

A: Terminating the loop is a blend of intuition, practice, judgment, and expertise. While quality has subjective elements, the process aims for objective goals like consistency, usability, and maintainability. The decision to stop iterating occurs when the engineer determines that the project has met a personal threshold for readiness, balancing the value of further refinement against practical release considerations. The understanding is that versioning allows for ongoing improvements, so perfection is not the immediate goal, but rather a robust, verifiable state for the current release.