How SolCrys Works

Building an AEO platform — 6 architectural decisions and what they cost us

AEO platforms get marketed at the PM and CMO layer. The engineering decisions underneath usually stay private — partly because they're competitive surface, partly because few customers ask, partly because writing about them is uncomfortable. This essay publishes six of SolCrys's architectural decisions: how we store prompt sets, how we probe engines, how we parse citations, how we canonicalize URLs, how we score recovery, and how we ground action recommendations in Corporate Context. Each one has a tradeoff I'd defend against any competing choice, and at least three I'd revisit if we were starting over. The point isn't to claim we got it right; it's to put a CTO's reasoning on the table so customers' technical buyers can audit it.

By Jia Chang, Co-Founder & CTO, SolCrys

Updated 2026-05-17

Questions this guide answers

How does an AEO platform work technically?
What is the architecture of an AEO platform?
How are AEO platforms built?
What engineering choices does an AEO platform make?

Why I'm publishing this

Most AEO platform marketing happens at the surface — dashboards, demo flows, customer logos. The architecture underneath stays private. That's normal in B2B SaaS, but it's also where customers' technical buyers — CTOs, VP engineering, MarTech-infrastructure leads — get the least information when they're trying to evaluate platform fit. They're asked to take it on faith that the engineering is sound.

We've decided to put a CTO's reasoning on the table instead. Six architectural decisions, each one with the tradeoff space, what we chose, and what I'd revisit if I were starting over. None of this exposes a moat that would be useful to a competitor; the choices are public-pattern decisions, not proprietary algorithms. What it does is let a technical buyer evaluate us the way they'd evaluate an open-source dependency.

Borrowing a phrase from a colleague who works in observability: "You can't trust what you can't audit." This essay is the audit surface.

Decision 1: Prompt-set storage — versioned tables vs. event log

Tradeoff: versioned tables are simple to query but make audit trails (who changed what when) painful. Event logs are audit-natural but harder to query for current state without a materialized view.

We chose event log with materialized current-state views. Every prompt addition, modification, retirement, and source-tag change is recorded as an event with a timestamp and an actor (which is sometimes a customer-facing role, sometimes an engine-feedback signal). The current Golden Prompt Set for any customer is a materialized view over those events at a given moment.

Why: when a customer asks "why did this prompt enter our set," we can show them the event with provenance. When a customer asks "what was our prompt set in Q1," we can replay the materialized view as of that date. This is a real cost — we run more storage and a more complex view layer than a simpler design would — but it's the right tradeoff for a measurement platform whose value depends on reproducibility.

Decision 2: Engine probing — synchronous vs. queue-based

Tradeoff: synchronous probes are simpler and lower-latency but cascade failures when an engine's API misbehaves. Queue-based probes isolate engine failures but add complexity around retries, dead-letter handling, and result-ordering.

We chose queue-based with per-engine queues. Each engine has its own queue with engine-specific retry policy (some engines are sensitive to retry-storms, others are not), dead-letter handling, and rate-limit-aware backoff.

What it cost us: a non-trivial chunk of engineering time on retry policy tuning, and a class of edge cases around what to do when an engine's queue is backed up by more than a sample-window. Worth it because the alternative — synchronous probes — would let a single misbehaving engine slow down every other engine's measurement, which we found unacceptable for customers running multi-engine programs.

Hindsight: I'd revisit the per-engine isolation strategy at higher concurrency than we currently see. The current architecture works through mid-2026 customer volumes; it might not at 5x. We have a path forward, but we haven't pulled the trigger.

Decision 3: Citation parsing — regex vs. structured-output LLM

Tradeoff: regex parsing is fast, deterministic, and brittle in the face of engine output format changes. Structured-output LLM parsing is more robust to format change but slower, costlier per call, and non-deterministic in its own way.

We chose hybrid: structured-output LLM as primary, regex as a fallback reconciliation pass. The primary path extracts citations into a structured format via a small LLM call; the fallback regex pass catches any citation the primary path missed and reconciles them against the structured extraction. Disagreements between the two paths get flagged for review.

Why: engine output formats change often enough that pure regex was producing too many false negatives, and engine output volume is high enough that pure LLM parsing was getting expensive. The hybrid path is a real-world good-enough.

Hindsight: I'd consolidate to a single LLM-only path if model costs continue to fall the way they did from 2024 to 2026. The regex reconciliation is now a tax that catches a small residual error rate, and the engineering attention on the regex path is disproportionate to its current value.

Decision 4: URL canonicalization — at fetch vs. at query

Tradeoff: canonicalizing URLs at fetch time (when the citation is first parsed) produces clean data with smaller storage but loses information about the original raw URL the engine emitted. Canonicalizing at query time (when reporting) preserves the raw data but bloats storage and slows queries.

We chose at query time, with a redirect-resolver cache. The raw URLs as the engine emits them are stored unchanged. When the report layer runs, a redirect-resolver service maps raw URLs to their canonical forms, with results cached aggressively.

Why: AEO reporting has to handle cases where an engine cites `https://amzn.to/xxx` and another engine cites `https://www.amazon.com/dp/yyy` — both pointing to the same product. Canonicalizing at fetch time would lock in whichever form the first engine emitted. Canonicalizing at query time lets us evolve the canonical-form rules without rebuilding history.

Hindsight: the cache hit rate is good enough that I wouldn't change this. The complexity tax is real but contained.

Decision 5: Recovery scoring — windowed aggregation vs. event-stream

Tradeoff: windowed aggregation (7-day, 30-day rolling) is simple, robust to short-term engine noise, and has clear meaning to a CMO. Event-stream scoring is real-time, captures fine-grained movement, and is more sensitive to noise.

We chose windowed aggregation for primary reporting, event-stream for internal diagnostics. The Recovery Score that customers see on their dashboard is a 7-day window for short-term and 30-day window for trend. The event-stream view is available internally for diagnosing whether a specific shipped action moved the needle in the hours after publication.

Why: the dashboard is for CMOs who want quarterly-coherent metrics; the diagnostic is for product teams chasing root causes. Different audiences, different time-scales.

Hindsight: I'd consider exposing the event-stream diagnostic to advanced customers earlier. We held it internal partly to keep the primary dashboard story clean, but customers running mature programs ask for it and we should ship it sooner.

Decision 6: Corporate Context grounding — RAG vs. fine-tune

Tradeoff: RAG (retrieval-augmented generation) is flexible — customer can update their brand facts and the next action draft uses the new facts immediately. Fine-tuning is faster per call but lock-stepped to a slow customer-feedback cycle. RAG has latency cost; fine-tune has staleness cost.

We chose RAG with aggressive caching of frequently-cited brand facts. Customer Corporate Context lives in a per-customer vector store; each action draft retrieves the most relevant facts at draft time and grounds the draft against them. Cache layer fronts the most-cited facts to reduce per-draft latency.

Why: customer-side brand-fact updates have to take effect immediately — a pricing change, a new partnership, a retracted claim — and fine-tuning can't deliver that without a multi-day retraining cycle. The latency cost of RAG is manageable; the staleness cost of fine-tune would have been a customer-trust issue.

Hindsight: this is the decision I'm most confident about of the six. The model landscape will keep moving and we'll likely revise the retrieval layer, but RAG over fine-tune is the right architecture for this problem class.

What I'd do differently

Three things I'd revisit looking at the system today, even though we've been building SolCrys for a fraction of the time most of these tradeoffs are usually discussed across.

Earlier investment in event-stream diagnostics for customers. We prioritized the internal diagnostic version first, on the assumption that customer-facing dashboards needed clean trend data. Customers running mature programs ask for the raw event stream sooner than we expected; we should have shipped it externally from the start.
Less time on the regex fallback in citation parsing. When we built the system, LLM extraction was both expensive and inconsistent enough that regex felt like a necessary reconciliation pass. The cost and quality curves moved fast — by the time we shipped, the regex path was already on its way to becoming a maintenance tax. I'd consolidate to LLM-only sooner.
More aggressive separation of measurement from action surfaces. We built them together because they share data sources; in hindsight they have different SLOs (measurement is high-throughput, action is governance-heavy) and clearer separation in the data model would have paid back. SolCrys is young enough that this is still inexpensive to fix.

About the author

Jia Chang is Co-Founder & CTO of SolCrys. AI architect with 15+ years building production AI systems, most recently as an engineering leader at Microsoft. Connect on LinkedIn.

Written May 2026. This is the engineering counterpart to Eason's measurement essay and Gwen's procurement playbook — together they cover the technical, measurement, and procurement dimensions of how SolCrys works under the hood.

FAQ

Doesn't publishing this give competitors a roadmap?

The six decisions above are at the public-pattern layer — versioned tables vs. event logs is a category-wide tradeoff, not a SolCrys invention. The proprietary work is in the implementation details, the specific tunings, and the customer-facing surfaces we've built on top. Publishing the architectural reasoning costs us a small amount of competitive surface in exchange for credibility with technical buyers who can audit it. The trade is worth it.

What about scale — does this architecture hold up at 10x current customer volume?

Most of it does. The places I'd watch are decision 2 (per-engine probe queues at higher concurrency) and decision 3 (the regex reconciliation pass becomes a larger fraction of runtime). We have planned investments in both, and I'd expect to revisit them before crossing the 5x threshold.

How does this compare to how other AEO platforms are built?

Honestly, I don't know in detail. Most AEO platforms haven't published their architecture, and the public marketing surface doesn't usually disclose enough to make a comparison. The hope of publishing ours is that it raises the bar — if other platforms publish theirs, the technical-buyer evaluation gets richer for everyone, including for the platforms.

What about the parts you didn't include?

Plenty — observability, on-call structure, deployment topology, the agentic-AI workflow infrastructure that runs much of the action layer, the audit-log and compliance surfaces. The six decisions above are the ones most likely to show up in a technical-buyer evaluation. If you're a customer's technical buyer and you want a deeper conversation, we'll do it under NDA.

Related guides

How SolCrys Works

AI Visibility Measurement Methodology

How we capture your AI visibility data across supported engines, with each response traceable to a prompt, engine, capture method, available model or surface signal, and timestamp. Consumer-surface and retail-assistant validation are scoped where technically reliable.

How SolCrys Works

Golden Prompt Set Methodology

We ground every AEO prompt set on real intent volume, public community questions, AI query signals, and live engine follow-ups - not synthetic keyword lists. Here's how we build it.

How SolCrys Works

Corporate Context Is the New CMS

Corporate Context gives AI marketing agents the brand facts, claims, guardrails, and evidence they need to execute safely across AEO workflows.

Strategy & Positioning

Why Two AEO Platforms Can Disagree on Citation Share

AEO measurement is more like survey design than search analytics. The four sources of disagreement between AEO platforms, the five questions that surface them, and how SolCrys answers each one — with explicit uncertainty.

How SolCrys Works

SolCrys Editorial Standards

SolCrys publishes its editorial standards in full: the 5 mandatory DO's, the 7 named anti-patterns we refuse to recommend, and the pre-publication checklist every SolCrys asset runs through. We hold our content to a higher bar than the AEO category average — and we want buyers to be able to check.

Free AI visibility audit

Find out where your brand is missing, miscited, or misrepresented.

SolCrys maps high-intent prompts to mentions, citations, answer accuracy, and content gaps so your team can prioritize the next pages to ship.

Get a free audit