My health scoring system flagged an account as churn risk on a Tuesday morning. Low login frequency, reduced feature adoption, no Slack activity in two weeks.
Every signal pointed the same direction.
I almost scheduled a retention call before I pulled the transcript.
What the Score Saw
The health score is real. I built it, I use it every morning, and I trust it.
What it tracks: login frequency, feature adoption breadth, Slack engagement, support ticket volume, time since last meaningful interaction with the team. It weights each signal, runs a composite, and spits out a number. Red, yellow, green.
This account was red.
The score was not wrong about what it measured. Login frequency was down. Feature adoption had contracted to two or three core workflows. Slack had gone quiet.
The score was doing exactly what it was built to do.
What the Transcript Said
I had a call with this account about three weeks prior. Standard check-in, nothing flagged. I pulled the transcript before booking the retention call, mostly out of habit.
The account had just finished rolling out an integration.
Before the integration: their team logged in daily to do things manually. After the integration: those same things happened automatically. Nobody needed to open the product to kick off the workflows anymore. They were using it more than ever. It was just invisible.
The reduced logins were not disengagement. They were efficiency. The integration was working exactly as designed, which meant the system no longer needed human hands on it.
A score that measures logins flagged a successful automation rollout as a churn risk. That is not a flaw in the scoring. That is just what a surface metric does when behavior changes shape.
The Difference Between Triage and Diagnosis
I have a rule now: the score tells me where to look. The transcript tells me what I am looking at.
Triage is: which accounts need attention this week? The health score answers that. It is fast, it is consistent, it covers everyone at once. I cannot do that manually across a portfolio this size, so the score does it.
Diagnosis is: what is actually happening with this account? The transcript answers that. It has nuance, context, history. It knows that the reduced logins are post-integration, not pre-churn.
The problem is when you skip from triage to action without the diagnosis step. That is how you send a retention email to an account that is quietly thriving.
What False Positives Actually Tell You
This is the part I did not expect.
The false positive was useful.
Not because it revealed a flaw in the health score, but because it revealed the shape of something the score cannot see: what it looks like when a customer has internalized the product so deeply that the surface metrics go quiet.
Healthy disengagement. A counterintuitive concept, but a real one. Some accounts stop being visible in the data not because they are pulling away, but because the product has become infrastructure. You do not notice infrastructure when it is working.
Now I have a third category in my monitoring. Not just healthy and at-risk. Also: quiet but potentially thriving. When an account goes quiet but recent calls were strong and they just finished a major rollout, I flag it differently.
The health score did not teach me that. The false positive did.
How the System Changed
I did not rebuild the health scoring. The signals are still the same.
I added context.
Before generating a health brief for any at-risk account, the system now pulls any call or meeting activity from the past 60 days and includes a summary alongside the score. The brief says: βRed (health score 2.1). Calls in the last 60 days: one check-in, three weeks ago. Notes: integration rollout completed.β
That one sentence changes everything about how I read the red flag.
The score is still doing triage. I just stopped letting it do diagnosis too.
Blake Bailey runs Bailey Business Ventures, an AI transformation consulting practice. He has never met an at-risk account that did not deserve a second look before the retention pitch.