Measurement

How SolCrys measures AI visibility - real-user fidelity, not API approximations

SolCrys measures AI visibility across two complementary channels because real users interact with AI both ways. For ChatGPT, Google AI Overviews, and Amazon Rufus, we capture from the rendered consumer surface - the same view a buyer sees when they actually use the product. For agents and deep research tools that query AI through programmatic interfaces, we query each engine's current default consumer-grade model with live grounding enabled, matching how agents call the API in production. Every data point is traceable to a specific prompt, engine, region, and timestamp. Snapshot cadence ranges from one-time audits to daily refresh on higher-tier plans, building the sample sizes that turn noisy single-snapshot reads into high-confidence visibility scores and trend lines. This page is the methodology document the most rigorous evaluators ask for - we publish it before sales calls so buyers can evaluate measurement quality on their own terms.

Updated 2026-05-08

Questions this guide answers

How do AI visibility platforms capture data?
Why should I trust AI visibility data?
How does SolCrys measure brand mentions in ChatGPT?
Browser channel vs API channel for AEO measurement

Direct answer

SolCrys measures AI visibility across two complementary channels because real users interact with AI both ways. For ChatGPT, Google AI Overviews, and Amazon Rufus, we capture data from the rendered consumer surface - the same view a buyer sees when they use the product. The responses we record reflect what your customers see, including product cards, retailer pricing, and follow-up Q&A. For agents and deep research tools that query AI through programmatic interfaces, we query each engine's current default consumer-grade model with live grounding enabled, matching how agents call the API in production.

Every data point is traceable to a specific prompt, platform, region, and timestamp. Snapshot cadence runs from one-time audits up to daily refresh on higher-tier plans, building the sample sizes that turn noisy single-snapshot reads into high-confidence visibility scores and trend lines.

Why measurement methodology is the trust question buyers should ask first

AI engines are noisy. Ask the same question twice in two minutes and you can get materially different answers - this is a documented property of how the underlying language models sample tokens, not a bug. Different users on the same engine often get different responses depending on regional model routing, account history, and live-web variability. An AEO platform that hides how it deals with this is asking you to trust their charts on faith.

The harder question to answer well is: does the platform measure what your buyers actually see? Not what some API endpoint returns under default settings - the actual experience of a real person typing your category question into a consumer chat surface or asking Rufus on a product page. Public engine documentation and third-party research consistently find that consumer chat UIs use different system prompts, different default tools, and sometimes different model routing than the same provider's public API. The same prompt can return a different answer through the same brand's two doors.

Channel 1: Consumer-surface capture

For ChatGPT, Google AI Overviews and AI Mode, Amazon Rufus, and similar consumer-facing AI assistants, we capture from the rendered consumer surface - the same view a buyer sees when they open the product or ask a shopping assistant. This is harder and more expensive to operate than pure API calls, but it is the only way to capture what your buyers actually experience.

For each tracked prompt the capture pipeline records the engine's consumer-surface state, including the model the engine routes to by default, whether web search is enabled by default for that surface, regional settings, and any UI elements the engine renders alongside the textual response. The technical details vary per engine and evolve over time. We comply with each provider's terms of service and use the access methods each provider supports, including third-party SERP capture infrastructure for surfaces with no official API. For enterprise customers with strict compliance requirements, we provide written documentation of the access methods used per engine.

The full rendered response text - the answer your buyer reads.
Source citations and their URLs - what the engine pointed buyers to.
Product modules and pricing where shown - product cards, shopping carousels, retailer placements.
Suggested follow-up questions - the engine's own model of 'what users ask next.'
Engine-disclosed model signals when available.
A timestamped audit artifact for every data point.

Why the consumer surface matters more than the API

The consumer products are not thin wrappers over the public API. Consumer ChatGPT routes to a specific default model, with web search enabled by default in many regions, and applies post-processing that the public API does not. Google AI Overviews and AI Mode have no public API at all - every Overviews tracker on the market today is doing some form of SERP capture under the hood; we just do it openly and document it. Amazon Rufus is embedded inside the Amazon shopping UX and surfaces product cards, prices, and Q&A that reflect live retailer data, none of which is exposed via any public Amazon API.

A platform that 'tracks ChatGPT' by hitting a generic API endpoint is measuring something - but it is not measuring what your buyers see. The two outputs can diverge enough that fixing one does not move the other.

What we capture per data point

Every captured response produces an audit trail with the fields below. If you ever question a specific data point, our team can reproduce it down to the exact prompt, engine, region, and time, and show you what the rendered response looked like.

Field	What it captures	Why it matters
prompt_text	Exact prompt submitted, character-for-character.	So you can replay it manually and verify.
engine	Which consumer surface or API endpoint.	So data is not blended across surfaces.
region / locale	Geographic and language settings.	Engines route differently by region.
timestamp_utc	When the capture ran.	Engines change behavior over time.
response_text	Full rendered response.	The thing your buyer reads.
citations[]	Each cited URL with title and snippet.	What the engine pointed buyers to.
product_cards[]	Any product modules shown.	What buyers can click to buy.
follow_up_questions[]	Engine-suggested next questions.	The buyer journey continuation.
model_signal	What the engine disclosed about the responding model where available.	Engines update defaults; we track when.
artifact_hash	Cryptographic hash of the captured artifact.	Tamper-evidence for the audit trail.

Channel 2: API capture for agents and deep research tools

A growing share of brand mentions and product recommendations no longer happen on consumer chat surfaces at all. They happen inside agents, deep research tools, and enterprise AI assistants that query the engines through their public APIs. For this channel, we query each engine's current default consumer-grade model with live grounding enabled, matching how agents call the API in production. We run sessions with no memory or personalization, in standardized regions and languages, so data is comparable across snapshots and across customers.

An agent built on top of an engine's API will typically return different answers than the same query in the same brand's consumer chat product. Both surfaces matter. By running both channels, SolCrys customers see the full picture: what individual buyers experience on consumer chat surfaces, and what agents and deep research tools surface when they programmatically query the engines.

Traceability: every data point should be falsifiable

A claim like 'your brand was mentioned in 23% of category prompts last week' is meaningless unless you can answer the obvious follow-up: show me which prompts, on which engines, at which times, with what exact response. Customers can drill from any chart or trend line back to the underlying responses. If you suspect a result is wrong, request the source artifact and reproduce the prompt yourself in the same engine - the response should match in substance.

This kind of traceability is the difference between a dashboard you can defend internally to your CMO or board and one you cannot.

Statistical confidence: handling AI engine variability

A single snapshot of an AI response is a snapshot of a noisy system. AI engines are non-deterministic by design - sampling introduces randomness so the conversation feels dynamic. Engines also re-route between models, update defaults, and change citation behavior day-to-day. To produce trustworthy trend lines, single snapshots are not enough; repeated capture is required.

Free Audit reports include a single snapshot for a directional read on where your brand stands today. Paid plans re-run prompts on a recurring cadence (weekly to daily depending on plan). Trend lines and recommendation share metrics are reported with rolling windows (typically 7-day, 30-day, 90-day) and confidence bands where the underlying sample is large enough to support them. When a movement is within the expected variance for a given engine, we flag it as such instead of reporting it as a real trend change.

The goal is not to eliminate engine noise - that is impossible. The goal is to give you enough sample size that your trend lines reflect genuine movement, not yesterday's coin flip.

Model-version transparency

A common opacity in this category is vendors saying 'we track ChatGPT' without disclosing which model version they are querying. Our policy: for each engine we track, we maintain a published model registry in our platform changelog and customer dashboard, showing the current consumer-surface model the engine reports for that surface and the API model we query in the API channel. When a provider changes a default, we update tracking and disclose the change in the changelog so customers can interpret any discontinuities in their trend lines.

We do not claim to track every variant of every engine. We track the consumer-default surface for each engine we cover, plus a parallel API track for engines where the API and consumer surface diverge meaningfully.

What we deliberately do not claim

To make the methodology credible, here is what we are explicit about not claiming.

We do not promise that fixing a flagged answer gap will guarantee citation lift. Engine behavior is influenced by hundreds of inputs we cannot fully see; we tell you what to fix and we re-measure after.
We do not promise daily refresh on every plan. Cadence is plan-dependent; the trade-off between cost and refresh frequency is honest and disclosed in pricing.
We do not promise complete coverage of every AI engine. New engines launch frequently; we track the engines our customers' buyers actually use.
We do not promise consumer-surface capture will never miss a UI element. Engines redesign their interfaces; when extraction fails for a new layout, we flag it in the data instead of silently degrading.
We do not personalize results based on a fake user identity. Sessions run with default state - no logged-in personalization, no memory, no account history.
We do not blend channels silently. Consumer-surface and API capture data are tagged separately; you can filter to either channel or compare them.

FAQ

How is consumer-surface capture different from just calling an API?

A consumer chat product is not a thin shell over the public API. It uses a specific default model, enables tools and post-processing the API does not, and renders UI elements like product cards and follow-up suggestions the API never returns. Capturing from the rendered consumer surface is the only way to measure what a buyer actually sees on these surfaces.

Why don't you just use the engine's API for everything? It is cheaper.

Cheaper to operate, but the data does not match what consumer users see. For Google AI Overviews specifically, no public API even exists. For ChatGPT and similar consumer chat surfaces, the API and consumer product can return materially different answers to the same prompt. Running both channels costs more but produces data that holds up when a customer reproduces a result manually.

How do you handle the fact that AI engines give different answers each time?

By repeated capture and statistical reporting. Daily or near-daily refreshes (depending on plan) over a rolling window let us report visibility scores and trend lines that reflect real movement rather than per-snapshot variance. We also report rolling confidence intervals where the sample size supports them, so you can see when a trend is statistically meaningful versus when it might still be noise.

What happens when an engine changes its default model?

We monitor provider announcements and model deprecations on an ongoing basis. When an engine changes its default consumer model, we update our tracking configuration to match within days and disclose the change in the platform changelog so customers can interpret any trend-line discontinuities.

Can I verify a specific result myself?

Yes. Every data point is traceable to the exact prompt, engine, region, and timestamp. Open any tracked prompt, select a snapshot, and you can see the full response we captured. To verify, copy the prompt text, set your browser to the same region, and submit it on the engine yourself within a short time window. The response should match in substance.

Is consumer-surface capture allowed under the engines' Terms of Service?

Our capture follows public-content access patterns each provider supports and does not require account login or personalization on consumer surfaces where that is the default user experience. We monitor provider terms and adjust our methods accordingly. For enterprise customers with strict compliance requirements, we can provide documentation of the access methods used per engine under NDA.

How is this methodology different from 'AI visibility' tools that just ping public APIs?

API-only trackers measure what a developer would get back from a generic API call. Consumer-surface capture measures what a buyer actually experiences. Both have value, but they answer different questions. SolCrys runs both because real-world AI visibility happens on both surfaces, and the answers diverge often enough that single-channel measurement misses things buyers actually care about.

Related guides