Something interesting is happening in software teams right now. AI is writing code faster than we ever could. User stories become functions. Functions become features. Features get pushed to Git and then to Server. And somewhere in that chain, a very important question stops getting asked: does this actually do what we thought it would do?
Not “does it compile?” Not “did the tests pass?” But does it work — correctly, reliably, in production, under real conditions, with real data, for real users? This is where observability becomes not just an engineering concern, but a product accountability tool. Teams already start using it that way. Still early days though.
When a developer writes code, there is implicit accountability. They understand what they wrote. They can explain it. They can debug it. They know the assumptions behind it. When AI generates code, that accountability gets blurry. The code looks right. It follows patterns. It passes review. It deploys cleanly. But the engineer who wrote it and the engineer who approved it may not fully understand every decision the AI made — especially in complex business logic, edge case handling, or data transformations.
This is not a criticism of AI. It is a reality of how AI-assisted development works. By the way, we call it Vibe Coding. And it creates a gap — between what we think the code does and what it actually does.
AI is confident when it’s right. It is equally confident when it’s wrong.
Observability is how you tell the difference and close the gap — in staging, in performance testing, in regression, before it ever reaches your users. And in production when something goes wrong. Trust me, things always go wrong.
“Acceleration without accountability is just faster risk.”
Holding AI output accountable — in real time
To close that gap, we need to start thinking of observability as an integral part of full stack — like linting, formatting, or unit test cases. Not an afterthought. Not a library you configure as an end checklist tick after functionality coding is done.
What does that mean practically? It means asking product questions through observability signals:
- What is the SLA of each transaction? What is the SLA of a cron job? What do the traces say?
- A Service Bus message needs to go through three separate systems and takes 10 seconds to get processed fully and complete its life cycle. In the ocean of thousands of messages, how do you track it? Can tracing and spanning help?
- This workflow was supposed to handle high-volume message bursts without dropping events. What do the metrics show at peak load?
- When something goes wrong at 2am, can you tell within 5 minutes exactly which message, which system, which step failed — without reading thousands of log lines manually? Trace Id, Span Id, Correlation Id will help you more than you think.
These are not infrastructure questions. These are product questions. And the only way to answer them with confidence is through real-time observability data — logs that capture context, traces that show the full execution path, metrics that track outcomes over time.
The silent drift problem
When the code runs, tests pass and you don’t see errors. Everything looks good. Until business comes back to you and tells you — a seat is not assigned to the crew member. That, if they realise and if they come back to you. These things drift silently.
Observability metrics can help avoid and catch these kind of scenarios. Just use a metric that always keeps track of crew seats — when one doesn’t get a seat, raise an alert along with a Trace Id and Correlation Id. If you use logs effectively — not just “function started and function ended” kind — it will take minutes to identify the problem rather than days.
The signals of OpenTelemetry make your life easier. It doesn’t always have to be a database configuration issue or pod memory issue or a Service Bus dead letter issue. It could be a simple functionality business rule in a complex system.
AI can also help you observe
Don’t blame yourself if you paste a bunch of logs into Copilot chat and get a different analysis each time. That’s the wrong input. Instead, start feeding it your observability data. Metric patterns, trace spans. Use it to gauge the overall health of your application. Not to replace your domain knowledge — AI will never know the overall context of your business. It can never guess or anticipate the edge cases you accumulate over a career.
They say “You can’t buy experience in the market.” Give AI structured observability data — metric patterns, correlated traces, span timings — and it becomes a powerful assistant. Not a replacement. An accelerator.
What this means for how you build
If you are using AI to accelerate your development — and most teams are now, whether formally or informally — a few things need to change:
- Instrument before you ship. Every AI-generated feature should have observability built in before it goes to production. Not as an audit — as a product requirement.
- Define what “working” looks like in data terms.Not just “no errors.” What business outcome should this code produce? What metrics would confirm it? What would a silent failure look like in your logs?
- Review AI output through traces, not just code review. Code review tells you what the code says. Traces tell you what it does. Both matter.
- Build accountability dashboards, not just health dashboards. A health dashboard shows you if your system is up. An accountability dashboard shows you if your system is doing what the product promised.
The bottom line
AI is a powerful development accelerator. But acceleration without accountability is just faster risk. Observability is here to mitigate that risk. Use it as a business requirement.
Has Vibe Coding caught you out in production yet? Don’t tell me ‘No’. I already know the answer 😄
Found this useful?
Let’s connect on LinkedIn →