aidarrowcaretcheckclipboardcommenterrorexperienceeyegooglegownmicroscopenavigatepillTimer IconSearchshare-emailFacebookLinkedInTwitterx

The Skills That Teach Themselves

How engineering teams evolved from “write this for me” to self-improving systems that get better with every PR.

When we started our Q2 AI check-ins, most conversations sounded like this: “How do I get the AI to stop hallucinating in my integration tests?”. Four weeks later, the questions had changed: “How do I make the AI learn from what happened last time?”

We stopped treating AI as a tool and started building systems that learn. That’s what craftsmanship looks like in 2026.

 

The Evolution We Observed

Across multiple teams and dozens of check-ins, the same pattern emerged:

Week 1: “The AI wrote the code, but I spent an hour fixing what it missed.”

Week 4: “The AI wrote the code, reviewed it against our standards, addressed the automated feedback, and iterated until the build went green. I submitted it for review.”

Better systems made the difference.

 

Three Principles of AI Craftsmanship

 

1. Build Lego Blocks

Early workflows tried to do everything at once: read the ticket, understand the codebase, write the code, create the PR. They were brittle. When one step failed, everything collapsed. 

The teams that succeeded built small, composable pieces:

  • Context gathering – understands the task and pulls relevant information
  • Planning -creates an implementation approach
  • Multi-agent review – different perspectives check different quality dimensions
  • Green loop – iterates on failures until the build passes

When chained together, these form a single command that handles the full lifecycle. Each piece is individually testable, debuggable, and reusable.

Example

We developed an on-call workflow composed of six distinct Lego blocks: triage, investigate, report, postmortem, noise-audit, and ops-team. Each component is designed to do one job with precision. Triage scans 60 monitors across 5 services, classifies them into signal and noise, and returns a prioritised list in under a minute.

When the investigation phase required an upgrade, we simply refined a single file without disrupting the broader system. These pieces can be run independently for specific tasks or chained together to automate a comprehensive shift handoff.

               

 

2. Review Yourself Before Humans Review You

One team noticed something: AI makes predictable mistakes. The same missing null check. The same forgotten test. The same accidental logging of sensitive data.

So they added a self-review step. Before any PR reaches a human, specialized checks run automatically:

  • Does this follow our coding standards?
  • Did we add tests for new code paths?
  • Are we handling edge cases?

The result: human reviewers stopped catching the same issues repeatedly. Their feedback shifted from “you forgot the test” to “have you considered this edge case?”

That’s a higher-quality conversation and a better use of everyone’s time.

Example: 

Before any PR reaches a human, we run a review tribunal. Each reviewer is a markdown file — adding a new review perspective takes 5 minutes and one file. The plugin ships with 20, teams add their own, and individual engineers can contribute personal ones. Our performance reviewer started as one engineer’s checklist of slow query patterns and missing indexes. Now it catches unbounded loops and N+1 queries on every PR automatically.

This shift allows reviewers to focus on high-level architectural design rather than pointing out forgotten null checks, resulting in a more sophisticated use of engineering expertise.

             

 

3. Turn Failures Into Rules

The most interesting pattern: workflows that maintain a living document of their own mistakes.

Every automated review catches something preventable, every time CI fails for a known reason, every time a reviewer leaves feedback the AI should have anticipated, it gets recorded. The next time the workflow runs, it reads its own history first. The same mistake never happens twice. The workflow gets smarter with every PR, and so does the engineer.

Example: Our authorization migration workflow performed well initially, but as edge cases accumulated, reliability began to falter. To solve this, we implemented a learning.md file—a living memory of tricky OAuth patterns and unexpected callers. By requiring the system to ingest these historical lessons before every execution, we ensured that the same oversight never occurs twice. Within two weeks, the recurring errors vanished, and accuracy stabilized through continuous, automated refinement.

 

What We’re Still Figuring Out

We’re not done. Some honest challenges:

  • Context limits are real. Large tasks hit token limits. We’re learning where to split work and what information to preserve across sessions.
  • Some tasks still need human judgment. Complex debugging, architectural decisions, and ambiguous requirements benefit from human intuition. The goal: eliminate the repetitive parts.
  • Discovery is hard. As the library of reusable workflows grows, finding the right one becomes its own challenge. Good documentation and clear naming conventions matter more than ever.

What changed

The teams succeeding with AI ask a different question. They used to ask: “How do I make AI do my job?”Now they ask: “How do I build a system where the AI and I both get better over time?”

That’s craftsmanship. 

AI workflows are no different. The best ones learn from every run. And they make the engineers who build them better, too.Ultimately, this craftsmanship allows us to build higher-quality, more reliable software, advancing Zocdoc’s mission to improve the healthcare experience for both patients and providers.