SolCrys Logo

Strategy & Positioning

Why two AEO platforms can disagree on the same brand's citation share — and what to do about it

If you ask two AEO platforms about the same brand's citation share, you can get materially different numbers. Sometimes they disagree by a few percentage points; sometimes by enough to change a CMO's quarterly read. The numbers themselves aren't necessarily wrong — they're the output of measurement choices that the platforms typically don't disclose. This essay walks through the four engineering decisions that drive most of the disagreement (prompt-set composition, sampling cadence, engine non-determinism, citation event definition), gives you five questions to surface those choices in any AEO vendor conversation, and applies the same five questions to SolCrys with explicit uncertainty disclosures. The goal isn't to convince you which platform's number is right. It's to give you the engineering vocabulary to interrogate any number you're handed, including ours.

Updated 2026-05-18

Questions this guide answers

  • Why do AEO platforms report different citation shares?
  • How is AEO citation share measured?
  • What questions to ask an AEO platform about measurement?
  • Is AEO measurement reliable?

Direct answer

AEO citation share is measured by running a set of prompts against AI answer engines on some cadence, parsing the engines' outputs for citations, and aggregating the results. Each of those four steps — prompt-set design, cadence, parsing, aggregation — involves choices that materially change the final number. Two AEO platforms making different choices will produce different numbers for the same brand, and neither has to be wrong.

What this essay is not: a claim that any specific platform's number is incorrect. What it is: an engineering note on the four sources of disagreement, and the five questions that surface those choices in a vendor conversation. SolCrys answers all five at the end, with explicit uncertainty disclosures.

AEO measurement is survey design, not crawl analytics

The category's marketing language often borrows from search analytics — "share of voice," "visibility," "impressions." Those words come from a world where measurement is deterministic: a crawler visits a page, a server log records the request, the numbers are facts about events that happened.

AEO measurement is not that world. There is no comprehensive log of when an engine cited a brand; the engine doesn't expose one. What an AEO platform does is sample: it queries a chosen set of prompts at a chosen cadence and counts what comes back. The output is a statistical estimate, not a comprehensive count.

The right analogy is Nielsen TV ratings or a political poll, not Google Analytics. The survey design — which prompts, how often, parsed how — determines the output. Different surveys, different outputs. Disagreement is structural, not a bug.

Source of disagreement 1: prompt-set composition

The single biggest driver of disagreement is which prompts the two platforms track. If platform A is tracking 200 generic category prompts and platform B is tracking 800 prompts including long-tail intent, their measured citation shares for the same brand will differ — sometimes substantially. Neither is wrong; they're measuring different populations.

Within the prompt set, three sub-choices matter: where the prompts come from (search-volume data, community questions, engine follow-ups, brand-supplied, synthetic), how often the prompt set is refreshed, and whether the prompts target the brand's actual buyers or generic category language. SolCrys's Golden Prompt Set methodology is our specific design choice on this; other platforms make other choices.

The question that surfaces it: how many prompts does the platform track for my category, where did they come from, and how often is the set refreshed?

Source of disagreement 2: sampling cadence

Two platforms running the same prompt set at different cadences will see different citation patterns. If platform A samples each prompt once a week and platform B samples once a day, B captures more of the within-week variance — including engine update effects, time-of-day differences, and short-lived citations that A misses entirely.

Cadence also affects how the platform handles freshness. A daily-sample platform can show a citation that appeared on Monday and disappeared on Wednesday; a weekly-sample platform may never see it. Neither is more accurate; they're answering different questions.

The question that surfaces it: how often is each prompt re-queried, and how is engine-side intra-day variance handled in the reported number?

Source of disagreement 3: engine non-determinism

AI engines are non-deterministic in a way classic search engines aren't. The same prompt, asked of the same engine, at the same minute, twice in a row, can produce different citation sets. Engines vary their outputs through temperature settings, retrieval re-ranking, and other internal choices that AEO platforms can't directly control or fully observe.

How a platform handles this matters. Some platforms run multiple samples per prompt and average; some run a single sample and accept the variance; some run a single sample but flag low-confidence outputs. Each choice changes the reported number.

Worth being explicit: this isn't a complaint about engines. Non-determinism is a feature of how generative AI works, and it's not going away. The question for an AEO platform is what it does with the non-determinism, not whether it can eliminate it.

The question that surfaces it: how does the platform handle the case where the same prompt produces different outputs across re-runs?

Source of disagreement 4: what counts as a citation

Even when two platforms see the same engine output, they can disagree about whether the brand was cited. Edge cases that matter: a URL is cited but the brand's name doesn't appear in the answer text — counts as a citation? A competitor's site mentions the brand by name and gets cited — counts as a brand citation? A redirect chain ends at the brand's domain — counts? The brand's social profile (LinkedIn, Twitter) is cited — counts as a brand citation or a third-party one?

Platforms make different choices, often without documenting them. Two reasonable engineers can land on different policies, and the resulting numbers diverge.

The question that surfaces it: what specifically counts as a citation event in the platform's definition?

The five questions to ask any AEO platform's measurement layer

Put the four sources of disagreement together with a fifth question about reproducibility, and you get the five questions worth asking any vendor:

  • Prompt set: how many prompts, where did they come from, refresh cadence, and were they tuned for my category?
  • Cadence: how often is each prompt re-queried, and how is intra-day engine variance handled in the reported number?
  • Non-determinism: how does the platform handle same-prompt re-run variance — single sample, multiple samples averaged, or something else?
  • Citation definition: what specifically counts as a citation event, including the edge cases (URL-only cites, third-party mentions, redirects, social profiles)?
  • Reproducibility: can the platform reproduce a single reported number — same prompt, same time window, same engine, same parser — and show me the underlying data?

How SolCrys answers each — with uncertainty disclosed

Here are our honest answers.

Prompt set

Our Golden Prompt Set methodology grounds every customer's prompt set on four real-world signals: intent volume across major search and marketplace surfaces, public community questions, AI query volume signals where reliable, and live engine follow-ups. Typical workspace size: tens of prompts (around 20 on Starter and Growth plans, 60 on Pro, 30 per client organization on Agency), expandable when a category needs deeper coverage. A focused set measured well beats a sprawling set checked once. Refresh: priority prompts daily, full set quarterly, plus event-triggered review when major engine or category news lands.

Cadence

Priority prompts run daily; the full set runs on a rolling 7- and 30-day window. Engine variance within a day is captured in the daily sample but reported on the 7/30-day window — so a single anomalous response on Tuesday doesn't move the weekly number. This is a deliberate trade-off: we accept a 1–2 day lag on emerging movement in exchange for stable trend reporting.

Non-determinism

We run multiple samples per prompt where the engine API permits, and report the median citation set with a note when the variance across re-runs is high. Where the engine surface is consumer-only (not API-accessible), we accept single-sample variance and flag the data point. This is one of the residual uncertainties in our system — high-variance prompts will read as noisy in dashboards, and we don't paper over it.

Citation definition

A citation event is recorded when the engine output references the brand's owned domain (after URL canonicalization, including redirect resolution) OR names the brand entity in the answer text with an unambiguous match. Third-party mentions of the brand are not counted as brand citations but are reported separately in the source-influence views. Social profiles are tracked but reported in their own bucket, not aggregated into the owned-domain count.

Reproducibility

Yes. Every reported number connects back to the underlying prompt-response records via prompt ID, sample timestamp, engine identifier, and parser version. Customer success can walk through any single number on request. We don't aggregate so deeply that the source data is lost.

What this doesn't fix

Five answered questions don't resolve every measurement disagreement. Different platforms making different reasonable choices on the four dimensions will still produce different numbers, and neither has to be wrong. What changes after this conversation is that you know which choices each platform made and can decide which set of choices fits your team's question.

Concretely: if your CMO's quarterly question is "are we trending up across the engines that matter for us," a 7/30-day rolling sample on a stable curated prompt set (our choice) is probably the right design. If your question is "did we move on this specific prompt today," a high-cadence per-prompt sample with multiple re-runs (a different design) might fit better. The platforms are answering different questions. That's a feature.

FAQ

Doesn't this just mean AEO measurement is unreliable?

It means AEO measurement is statistical and survey-shaped, not deterministic. That's not the same as unreliable — Nielsen ratings are statistical and they decide billions in ad spend. What matters is that the platform's design choices are disclosed and consistent, so the trends are interpretable. A reliable AEO platform is one whose measurement is reproducible and whose methodology is publishable.

Why don't more AEO platforms publish their methodology?

Some treat methodology as proprietary; some haven't formalized it yet; some haven't been asked. The category is young enough that the publication norm hasn't settled. Our view is that publishing methodology is the right default — buyers can audit it, and the platform can't quietly change the rules. Our editorial standards and visibility measurement methodology pages put our cards on the table.

Should I worry that SolCrys's numbers are wrong?

You should treat any single number from any AEO platform — including ours — as a survey estimate with uncertainty. The trend across multiple measurements is more informative than any single point. We try to surface the uncertainty in our dashboards rather than hide it; if you spot a number that looks anomalous, ask us about the underlying sample.

What if two platforms agree on a brand's citation share?

Agreement between platforms is informative — it suggests both are within the same statistical neighborhood. Disagreement is also informative — it usually surfaces a methodological choice worth understanding. Neither is automatically a quality signal; both are inputs to a richer conversation about what the platforms are actually measuring.

Does engine non-determinism mean SEO is a more reliable channel?

Different reliability profile, not strictly better or worse. SEO has its own noise sources — algorithm updates, SERP feature changes, attribution fuzz — but the underlying measurement (URL impressions, clicks) is more deterministic. AEO has more statistical noise but captures a meaningful share of buyer decision-making that SEO doesn't see. Most serious marketing programs in 2026 measure both.

Related guides

How SolCrys Works

AI Visibility Measurement Methodology

How we capture your AI visibility data across supported engines, with each response traceable to a prompt, engine, capture method, available model or surface signal, and timestamp. Consumer-surface and retail-assistant validation are scoped where technically reliable.

Buyer Guides

Evaluate an AEO Platform's Data Methodology

Six questions every buyer should send to every AEO platform - including us - before signing. We designed SolCrys to answer all six; here's how, and what to listen for from anyone you're evaluating.

Measurement

AI Share of Recommendation

AI Share of Recommendation measures how often answer engines recommend a brand, not just whether they mention it. Learn how to track and improve it.

Free AI visibility audit

Find out where your brand is missing, miscited, or misrepresented.

SolCrys maps high-intent prompts to mentions, citations, answer accuracy, and content gaps so your team can prioritize the next pages to ship.

Get a free audit