Attribution & ROI

AEO Recovery Score: a quantified framework for action-to-result tracking

AEO Recovery Score is the percentage of an identified answer gap that has been closed within a defined measurement window after a fix action ships. The formula is simple - post-fix performance minus pre-fix performance, divided by target performance minus pre-fix performance, expressed as a percentage - but the discipline around it is not. Every term needs precise definition: the right metric for each gap type, a baseline measured over a fixed pre-fix window, post-fix measurement at a fixed window, and a realistic target derived from category-leader benchmarks rather than 100%. Without a per-fix recovery metric, AEO programs accumulate fixes without learning which ones worked. With one, teams can replicate winning patterns, retire failing ones, and produce finance-reviewable inputs to the AEO ROI model. This guide defines the formula, gives illustrative worked examples for absence, citation, and accuracy gaps, and connects Recovery Score to revenue using the bridge formula.

Updated 2026-05-06

Questions this guide answers

What is AEO Recovery Score?
How do I measure AEO content impact?
How do I track if my AEO fixes are working?
What is action-to-result tracking in AI search?

Direct answer

AEO Recovery Score is the percentage of an identified answer gap that has been closed within a defined measurement window after a fix action was shipped. The formula: Recovery Score equals (post-fix performance minus pre-fix performance) divided by (target performance minus pre-fix performance), expressed as a percent. A 0% score means the fix produced no measurable change. A 100% score means the gap is fully closed.

Recovery timing and magnitude vary by gap type, engine, category, and competitive movement. Recovery Score replaces the loose 'did our AEO program work?' question with a per-fix, per-prompt, per-engine quantified answer. It is the metric that closes the loop between identifying gaps and feeding a finance-reviewable ROI model.

Why a recovery metric is necessary

AI visibility platforms typically report two kinds of metrics: baseline (where you are now) and trend (how the baseline changed). Neither answers the operationally critical question - 'for the gap I tried to fix in March, did the fix work?' That question is hard for three reasons.

Gaps are heterogeneous. A citation gap in ChatGPT requires different signals than an absence gap in Google AI Overviews.
Engines respond on different timelines. Owned-content fixes can show up sooner than third-party citation work, but the exact window should be validated by repeated measurements.
External factors confound trends. Competitor moves, AI engine updates, and category-wide shifts make raw trend data noisy.

The Recovery Score formula

Each term in the formula has a precise definition: pre-fix performance, post-fix performance, and target performance.

Pre-fix performance (baseline)

The metric value before the fix shipped, measured against a fixed prompt set. The metric depends on the gap type. Measure over at least a 7-day window before the fix, not a single point measurement.

Absence Gap: inclusion rate (% of prompts where SKU/brand appears).
Citation Gap: citation share (% of prompts where your source is cited vs competitor).
Accuracy Gap: accuracy rate (% of mentions with correct attributes).
Comparison Gap: recommendation rank (average position when included).
Action Gap: operational close rate (% of identified gaps converted to shipped fixes).

Post-fix performance

The same metric measured at a defined window after the fix ships. Use a fixed window (14, 30, 60, or 90 days), not 'until it looks good.'

Target performance

The performance level you would have if the gap were fully closed. Targets should be realistic: the strongest competitor's inclusion rate, parity with the closest one or two competitors, an accuracy threshold agreed with product or compliance, or the category leader's recommendation rank. Setting unrealistic targets makes every fix look like a failure.

Engine-specific measurement windows

Recovery timing varies by engine. Use the table below as planning windows, not guarantees.

Engine	First measurement	Stable measurement
Google AI Overviews	14 days	60 days
ChatGPT	21 days	60-90 days
Perplexity	14 days	30 days (often more responsive to recency)
Claude	21 days	60 days
Gemini	14 days	45 days
Amazon Rufus	14 days	30-45 days
Walmart Sparky	14 days	30 days
ChatGPT Shopping	21 days	45 days

Illustrative worked examples

Illustrative scenarios only. The numbers are pedagogical inputs, not benchmarks or customer results.

Example 1: Absence Gap on Walmart Sparky

Assume a DTC supplements brand identifies that 9 of 12 SKUs do not appear in Sparky for the prompt 'best multivitamin for [persona].' Pre-fix inclusion rate: 25% (3 of 12 SKUs). Sample target: 75% based on the category leader's observed rate in this hypothetical prompt set. Fix action: filled structured attributes, rewrote titles, and built compliant Q&A coverage across all 12 SKUs.

In the sample calculation, inclusion rate rises to 58% (7 of 12 SKUs). Recovery Score: (58 minus 25) divided by (75 minus 25), times 100% = 66%. This shows partial gap closure. Investigate why 5 SKUs still do not appear before repeating the fix.

Example 2: Citation Gap on ChatGPT

Assume a B2B SaaS brand identifies that ChatGPT cites 4 competitors but never the brand for 'best [category] for mid-market' prompts. Pre-fix citation share: 0%. Sample target: 25% based on the strongest non-leader competitor's share in this hypothetical prompt set. Fix action: published a category overview pillar, secured two vertical newsletter mentions, and began compliant community engagement.

In the sample calculation, citation share rises to 8%. Recovery Score: 32%. This shows lift but not parity. Continue investment only if the underlying sources are improving and the prompt set remains commercially relevant.

Example 3: Accuracy Gap on Google AI Overviews

Assume a health brand identifies AI Overviews mentions the brand but cites an outdated dosage guideline. Pre-fix accuracy: 30%. Sample target: 95%. Fix: updated dosage page with current data, refreshed schema, requested re-crawl.

In the sample calculation, accuracy rises to 92%. Recovery Score: 95%. The example shows how owned-source fixes can be measured when the source of the error is known and controllable.

Engine-specific recovery patterns

Recovery rates vary substantially by engine and gap type. Use the patterns below as measurement expectations, not benchmarks.

Owned-content gap fixes can move faster than third-party citation fixes because the brand controls the source.
Schema and structural fixes should be measured after a consistent waiting window across the same prompt set.
Third-party citation fixes (Reddit, G2, editorial) compound slowly and should be evaluated over longer windows.
Recency-only fixes (refreshing date without content change) should not be counted as real fixes; engines can discount cosmetic updates.
Retail engine fixes should be re-tested after marketplace catalog, listing, review, and Q&A data has had time to refresh.

What a healthy AEO program's Recovery Scores look like

Healthy distributions should include a mix of clear wins, partial wins, weak fixes, and occasional regressions. Distributions that are sharply one-sided - everything looks like a win, or nothing moves at all - usually mean targets are miscalibrated, measurement windows are wrong, or the prompt set is too small.

How Recovery Score connects to AEO ROI

Recovery Score is the operational metric. AEO ROI is the financial metric. The bridge: estimated revenue lift from a fix equals prompt revenue value multiplied by Recovery Score (as a percentage) multiplied by the probability that recovered visibility converts to revenue. Stacked across a quarter's worth of fixes, this produces a finance-reviewable estimate when assumptions and confidence ranges are documented.

Common mistakes

Five recurring mistakes degrade Recovery Score reliability.

Measuring too soon. Reporting Recovery Score immediately after a fix produces noise; use a consistent window matched to the engine, gap type, and source-refresh cycle.
Confusing absolute lift with Recovery Score. A 5% citation share lift can be either great or weak - it depends on the target.
Cherry-picking which fixes to score. Score every shipped fix, including failures.
Using overlapping measurement windows. Stagger fixes by category or prompt cluster.
No prompt-set discipline. If the prompt set drifts between pre-fix and post-fix measurement, the comparison is invalid.

How to use this guide

Define your prompt set, measure pre-fix performance for every gap before shipping, set realistic targets (not 100%), define measurement windows up front, calculate Recovery Score for every fix, roll up monthly, and triage low-scoring fixes. Connect Recovery Scores to revenue using the bridge formula.

If you want a working tracker that automates pre-fix baseline, post-fix measurement, and Recovery Score calculation, request early access from the SolCrys team.

FAQ

Should I report Recovery Scores to my CFO?

Probably not directly. CFOs care about revenue. Use Recovery Score internally to track program health and feed your AEO ROI model. Report ROI to the CFO; report Recovery Score distribution to the CMO or VP Marketing.

What if a fix has a Recovery Score above 100%?

It happens when post-fix performance exceeds the target you set. It usually means the target was too conservative. Note the over-recovery, raise the target for similar gaps in the future, and treat it as a signal that this fix pattern is exceptional.

Can I use Recovery Score for SEO investments too?

Conceptually yes - the framework adapts. Practically, traditional SEO has more mature attribution models. Recovery Score adds the most value for AEO and AI search where attribution is fuzzier.

How does Recovery Score relate to share of voice?

Share of voice is a baseline metric (where you stand). Recovery Score is a delta metric (how much you closed a specific gap). Both are useful; they answer different questions.

What if my fix appears to work but the engine then updates and the gain disappears?

That is a regression. Common after Google updates or major model releases. Track regression rate as a separate metric. Some regression is normal and should not invalidate the original Recovery Score - note that recovery was achieved and the gap subsequently re-emerged due to platform change.

Do I need a platform to measure this, or can I do it manually?

For 5 SKUs and 25 prompts, manual measurement in a spreadsheet works. For 100+ SKUs or 50+ prompts, the manual cost exceeds the value. SolCrys and similar platforms automate the measurement loop.

Related guides

AEO Fundamentals

Find out where your brand is missing, miscited, or misrepresented.

SolCrys maps high-intent prompts to mentions, citations, answer accuracy, and content gaps so your team can prioritize the next pages to ship.

Get a free audit