Skip to content
HoursBack
← All posts
Strategy17 June 20266 min readBy David Bevan

The afternoon my own AI agents were confidently wrong - and why that is the point

Every HoursBack assessment ends with a short, practical list: here are the tools that would give you hours back, and here is whether a ready-made connection already exists so your team is not stuck wiring it up by hand. Keeping that list accurate means keeping up with a market that changes weekly. So I did what I tell clients to do - I put AI to work on the boring, repetitive part. This is what happened.

What I actually did

I needed to map every official AI connector relevant to a UK small business - the integrations that let an AI assistant safely read your accounting, your CRM, your help desk, your store - and check each one against what I already track. Rather than trawling vendor sites for a week, I ran it as a multi-agent investigation:

  • One group of agents fanned out across the sources: the official Anthropic and OpenAI directories, the published list of available integrations, and the common small-business software stack.
  • A second group took each candidate and verified it against the live vendor page: is it genuinely official, how hard is it to set up, what does it actually do.
  • A third, deliberately sceptical group tried to knock each finding down before it was allowed through.
  • A final "what did I miss" pass kept looping until it came up dry.

Nothing was published automatically. Everything landed in a review queue for a human to sign off - which matters, and I will come back to why.

The numbers (the interesting bit)

Across two runs on 9 June 2026:

  • 413 AI agents did the work.
  • 11,236 model calls between them.
  • 255.6 million tokens processed.
  • About 90 minutes of wall-clock time.
  • I paid nothing extra for it. It ran inside the flat-fee AI subscription I already hold - no separate bill, no meter ticking. On standard metered pricing, the same compute would cost roughly £230 in model tokens alone - but that figure is the list-price equivalent, not what I spent.

Two things make those numbers possible. First, it was covered by the flat-fee plan I already pay for - the same kind of subscribed AI most businesses already have sitting unused. Second, and this is the engineering bit: 99.2% of everything the agents read was served from cache. Each agent re-uses the same shared context instead of paying for it fresh every time. Without that, the cost would have been substantially higher - the 99.2% cache-hit rate is most of the saving. Good engineering, not a clever discount.

The result: the connector list my reports draw on more than doubled, from 33 to 78, now covering the tools UK small businesses actually run - Xero, QuickBooks, Shopify, HubSpot, Salesforce, Pipedrive, Mailchimp, Calendly, Dropbox and dozens more.

The part I am proudest of: the agents were wrong, and I caught it

Here is the bit most "AI did everything" stories leave out. On the first pass, my sceptical agents confidently rejected two connectors - Monday.com and Shopify - as "no official version exists." Both were wrong. Both are, in fact, listed in the official directory. I spotted it, pushed back, and a second targeted check corrected the record.

This is exactly why every HoursBack recommendation passes a human sign-off before it reaches you. AI is brilliant at covering ground fast - 413 agents in 90 minutes is ground no person could cover. But "fast and mostly right" is not the standard for advice you are going to act on and pay for. The judgement, the override, and the final tick stay with a person. That combination - machine breadth, human judgement - is the whole HoursBack method, applied here to my own systems.

Why this matters for you

When your assessment says "this would save your team six hours a week, and there is a ready connection for it," that line has been checked against a live, current map of what is genuinely available - not a stale list, and not a guess. I keep that map fresh the same way I would help you automate anything else: let the machines do the repetitive sweep, keep a human on the decisions.

If you want to know which of your own tools could be giving you hours back, that is what the assessment is for.

Book your assessment

Ready to reclaim 5-10 hours a week? Book your AI workflow assessment. 60-minute diagnostic, custom report in 48 hours, agent blueprints and automation recipes built around your business.

Know someone who could use this? Get a referral link and earn £50 for every friend who books an assessment.

2-minute check

Not sure where AI fits in your business?

Answer nine quick questions and get your AI readiness score plus three personalised quick wins. Free, no jargon, takes about two minutes.

Take the free quiz →

Get AI tips for your business

Practical advice on saving time with AI. One email per week, no spam.