Coaching the Agent: A New Discipline for Senior Engineers

Precision Over Novelty

I’ve spent the last two decades building technical solutions inside constrained environments, largely within financial institutions. The work was less about novelty and more about precision: identifying deep systemic problems and resolving them under conditions where mistakes weren’t merely bugs, but institutional risks.

Early in my career, I realized my impact was capped by how much code I personally wrote. Coaching, hiring, and mentoring other engineers had a compounding effect that individual output never could. That shift pulled me toward building teams, establishing standards, and shaping Software Development Life Cycles (SDLCs) optimized not just for delivery, but for correctness under pressure.

Why Production Is Sacred

Trading Floor Discipline

One long-standing organization represented the extreme end of that spectrum. On the trading floor, there was effectively zero distance between a developer and a catastrophic outcome. Every line of code was tested, reviewed, re-reviewed, and tested again. No change reached production without explicit managerial review; peer review alone was insufficient. When something failed, the response was immediate: a live trade, prices moving in milliseconds, and no tolerance for uncertainty about what shipped or how to reverse it. There was no time to hunt down the person who made the change. Familiarity with all changes wasn’t a requirement – it was the job.

The Cost of Being Wrong

Other environments operated differently, but with similar rigor. The risk was less about milliseconds and more about reputation. Velocity wasn’t immediate, but deliberation was relentless. Releases were debated exhaustively because the blast radius of a mistake extended beyond systems into trust, credibility, and institutional standing. The constraint wasn’t technical complexity – it was the cost of being wrong.

AI Breaks the Feedback Loop

Treating production as sacred shaped how I reacted when modern AI coding tools first appeared. Tools like Copilot could refactor large sections of code or suggest libraries instantly, but the outputs were opaque. The speed was undeniable; so was my distrust. What once required reading manuals and understanding internals was reduced to syntactically correct suggestions that sometimes worked. Code generation accelerated to near light speed, with little to no comprehension behind it.

The Paradigm Shift

AI Inflection Point

When I joined Zocdoc, the constraints changed again. The organization was willing to adopt a broad set of AI tools. (For clarity, the tools mentioned, Copilot, Cursor, and Claude Code, are approved and sanctioned at Zocdoc – reinforcing our privacy posture). At the same time, I was relearning a new stack: C#, AWS, infrastructure as code. Velocity mattered, but so did correctness. Cursor and Claude Code became part of that environment, and my early, naive usage produced predictable results: confident, end-to-end solutions that didn’t compile, violated conventions, or failed in subtle ways.

From Autocomplete to Engineer

The adjustment was a paradigm shift. I stopped treating the tool as autocomplete and started treating it like an engineer – specifically, an engineer operating in unfamiliar ground. I coached it the way I would coach a developer, then constrained it and broke the work into stages.

My prompt changed from:

Upgrade from v1 to v2 following the standards located here.

To something closer to:

We need to upgrade from v1 to v2.
First, analyze the code example below.
Then, explain the upgrade steps and why they’re correct.
Produce a short step document.
Then execute the plan one section at a time (section X, Y, Z).
After each section, stop and present results for review before continuing.

Forcing Constraints: Plans Before Execution

I asked for plans instead of implementations. I forced stepwise progress, verification at each stage, and explicit reasoning about tradeoffs. Sometimes I told it to review itself before moving to the next step. I applied the same review discipline I’d developed over years of leading teams.

The New Engineering Discipline

From Coding to Orchestration

The outcome wasn’t that AI wrote better code than I could. The outcome was that my role shifted entirely toward planning and judgment. I spent less time typing and more time reviewing, testing, and detecting inconsistencies. The work moved upstream: defining the right abstraction, enforcing standards, and validating behavior – not producing syntax.

What Engineers Must Learn Now

This is the shift engineers now face. Writing code is no longer the primary bottleneck – we must now direct agents to follow repeatable patterns and review the results. Knowing how to test AI-generated changes, reason about correctness, and catch subtle inconsistencies is becoming a core engineering skill. You have to notice the loose threads in generated code – the unexplained assumptions, the “looks right” glue code, the quiet convention breaks – and know which ones to pull until the whole picture is trustworthy.

None of this is learned from prompt templates alone. It’s learned by running the process, watching it fail, and building intuition for the failure modes.

Judgment Is Still the Control Pane

We can build systems faster than ever before. Often, we can build them better. But the old disciplines didn’t become obsolete – they became more important. Speed without rigor is still a risk. AI amplifies output, not judgment. Judgment remains the control plane. Protecting production still matters, perhaps even more.

Eftal Sogukoglu is a Staff Engineer on the Product Engineering team at Zocdoc. He focuses on building scalable systems and improving engineering workflows.