Citation & Source Influence

Reddit, Wikipedia, and TechRadar Dominate AI Citations: A 36,268-Citation Study of the AEO Category

Published 2026-05-22 · Updated 2026-06-11

Questions this guide answers

What sources do AI search engines cite most?
Why does ChatGPT cite Wikipedia?
How important is Reddit for AI search?
What domains do AI engines pull from?
How can I get cited by AI search?

Direct answer

Across 36,268 citations drawn from 44 buyer-comparison prompts, 5 AI engines (ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude), and 30 days of continuous measurement in the Answer Engine Optimization (AEO) category, Reddit (2,882 citations), Wikipedia (1,489), and TechRadar (1,133) are the three most-cited domains — together about 15% of all citations. Vendor-owned content — the .com sites of every brand competing in this category combined — is only ~1.5% of citations (560). That includes SolCrys, the publisher of this study, which is a vendor in the AEO category, ranks 11 of 17 on category mention rate at 5.8%, and whose own domain now sits at #9 in the top-10 cited list. The most-cited vendor blog in the dataset is Semrush at 2.4% (883 citations). The pattern holds: AI engines preferentially cite third-party community, editorial, and long-tail sources before they reach for any single vendor's own website.

Methodology

This is a self-published study by SolCrys, an AEO platform vendor. We're disclosing methodology in full so any researcher — or competitor — can replicate or contest the findings.

Engine	Vendor	Surface tested
ChatGPT	OpenAI	Web-search-enabled GPT response
Perplexity	Perplexity AI	Default search response
Google AI Overviews / AI Mode	Google	AI-generated answer panel
Gemini	Google	Standalone Gemini response with web grounding
Claude	Anthropic	Claude response with web search (added mid-window)

Workspace and time window

Workspace: `solcysai-aeo` (a continuous monitoring workspace SolCrys runs on its own category)
Window: 2026-05-12 to 2026-06-11 (30 consecutive days)
Tool: SolCrys AEO platform, citation-extraction module

Prompt set

We used a fixed set of 44 buyer-comparison prompts designed to reflect the queries a B2B marketer, agency lead, or VP-Marketing would type into an AI engine while shortlisting AEO tools. The full prompt set is available on request; representative examples:

"What are the best AEO tools for B2B marketing teams in 2026?"
"Best AEO platforms for agencies managing multiple clients"
"Best tools to track brand visibility in ChatGPT"
"Top generative engine optimization platforms compared"

Engines

These are paraphrases. The actual prompts vary in punctuation and qualifier wording — a deliberate choice, since AI engines respond differently to small wording shifts and a single-phrasing study would over-fit.

Five engines, queried daily. Claude (Anthropic) was added partway through the window, so it has fewer runs than the original four.

Citation extraction

44 prompts across 5 engines over 30 days ≈ 3,493 total responses analyzed (Claude joined mid-window, so its run count is lower than the other four).

For every one of the ~3,493 responses, we extracted every URL the engine cited or linked as a source. Each URL was reduced to its registrable domain (e.g., `https://blog.hubspot.com/foo` → `hubspot.com`) and counted. Domains were classified into five source types:

Owned: the .com (or product subdomain) of any brand competing in the AEO category — these are vendors citing themselves.
Competitor: the .com of a vendor in an adjacent or overlapping category (e.g., HubSpot, Semrush, Ahrefs).
Editorial: trade press, news, analyst sites (TechRadar, Search Engine Land, Marketing Brew, vendor-run editorial blogs treated as competitor only when the vendor is in-category).
UGC: Reddit, YouTube, Quora, forums, community wikis.
Other: everything else — long-tail blogs, personal websites, academic pages, GitHub, niche industry sites, etc.

What this study can and cannot claim

Totals: 36,268 citations, 7,962 unique URLs, 2,858 unique cited domains.

SolCrys (publisher of this study) is a vendor in the AEO category. In this dataset, SolCrys ranks 11 of 17 on cross-engine mention rate at 5.8%, and its own domain is #9 in the top-10 cited list. The most-cited vendor blog in the dataset is Semrush at 2.4% (883 citations), with Profound a hair behind (881). We have no commercial incentive to inflate any competitor's number or deflate Reddit's; the methodology counts every cited URL regardless of source. If we wanted to bias this study to flatter SolCrys, we would not have published one where we sit in the bottom half of our own category measurement.

This study describes citation patterns for the AEO category specifically, during a 30-day window, using a 44-prompt sample across 5 engines. It is sufficient to characterize the source-layer distribution AI engines pull from when answering buyer questions about AEO platforms. It is not sufficient to claim the same distribution holds for unrelated categories (cybersecurity, legal tech, consumer electronics may look very different — we have parallel studies in progress), nor that it is stable across longer horizons. Caveats are spelled out in detail later in this report.

The top 10 cited domains

Three observations from this table:

Community and editorial own the top three. Reddit is now the single most-cited domain in the category at 2,882 citations — nearly double #2. Wikipedia (1,489) and TechRadar (1,133) follow; the three together are about 15% of all citations. For buyer-comparison queries here, AI engines lean first on authentic community discussion (Reddit), neutral encyclopedic content (Wikipedia), and curated trade-press roundups (TechRadar) before any single vendor's marketing site.

The vendor gap has narrowed — but third-party still leads. Positions 4 through 10 mix vendor blogs (Semrush, Profound, HubSpot, Ahrefs), an academic source (arXiv), a video platform (YouTube), and — new this refresh — SolCrys's own domain at #9. Each earns ~1.5–2.4% individually. Vendor citations have grown sharply since our last measurement (Semrush 338 → 883, Profound 361 → 881, HubSpot 380 → 709), so the Wikipedia-to-top-vendor ratio fell from ~5:1 to ~1.7:1. But no single vendor dominates owned-media presence; the leader, Semrush, still captures only 2.4%.

YouTube is in the top 10 — and climbing. Otterly's 2026 YouTube Citation Study identifies YouTube as the #2 social platform for AI citations behind Reddit; Ahrefs' December 2025 75,000-brand study reports YouTube mentions show ~0.737 correlation with AI visibility across ChatGPT, AI Mode, and AI Overviews. Our #7 ranking for YouTube is consistent with both.

Rank	Domain	Citations	Type	% of total
1	reddit.com	2,882	UGC	7.9%
2	wikipedia.org	1,489	Editorial	4.1%
3	techradar.com	1,133	Editorial	3.1%
4	semrush.com	883	Competitor	2.4%
5	tryprofound.com	881	Competitor	2.4%
6	arxiv.org	809	Editorial	2.2%
7	youtube.com	743	UGC	2.0%
8	hubspot.com	709	Competitor	2.0%
9	solcrys.com	560	Owned	1.5%
10	ahrefs.com	558	Competitor	1.5%

Distribution by source type

This is the chart that should change how a B2B marketer thinks about content strategy.

The long tail is still the largest single bucket. 45.1% of all citations point to domains that did not crack the top 10, top 50, or in most cases the top 100 — blogs, personal sites, niche industry pages, academic content, GitHub repositories, agency case-study pages, and thousands of other small destinations each contributing a handful of citations. Spread across roughly 2,800 distinct domains, no single one matters; collectively, they're the plurality of AI answer source material.

Competitor vendor blogs are now a quarter of all citations. The competitor share jumped to 24.5% (from 17.6% in our prior measurement) — each individual vendor still owns only 1.5–2.4%, but added together, vendor blogs are now the second-largest source type and gaining. AI engines do read marketing content; they just don't weight any single piece of it heavily.

Owned media is small but no longer negligible. Across every brand in the category, including their own websites, ~1.5% of citations are owned — up from under 1% in our prior measurement, as vendors (SolCrys included) published more and earned their way into the top 10. It is still small, and it still contradicts the assumption that publishing more to your own blog is how you "win AI search." Owned content has a job (be retrievable and correct), but it is not where the citations are.

This finding aligns directionally with two independent studies. OtterlyAI's "AI Citation Economy" report (1M+ data points across ChatGPT, Perplexity, and Google AI Overviews) finds AI engines depend ~95% on third-party sources, with brands receiving frequent mentions but weak link citations. Separate PR-trade reporting cites 94% of AI citations coming from earned media — both converge on the same structural pattern as our 98.5% non-owned share.

Source type	Citations	% of total
Other (long-tail)	16,368	45.1%
Competitor (vendor blogs)	8,890	24.5%
UGC (Reddit, YouTube, forums)	4,044	11.2%
Editorial (trade press, news)	3,744	10.3%
Owned (the brand's own .com)	560	1.5%

Three findings that matter for B2B marketers

Finding 3: Even the most-cited AEO vendor only captures 2.4% of category citations

Reddit (2,882 citations) and Wikipedia (1,489) each individually pull more than Semrush (883) — the most-cited vendor blog in the category. The Reddit-to-top-vendor ratio is roughly 3.3:1; Wikipedia-to-top-vendor is roughly 1.7:1. That gap has narrowed since our last measurement as vendors' AI-cited content grew, but community and editorial sources still lead the table.

Implication: owned media is necessary but structurally insufficient. The highest-leverage incremental dollar in an AEO content budget is earned media — getting your brand named, quoted, or linked in editorial coverage and community discussion that already wins AI mindshare. That's a PR, community, and analyst-relations problem, not just a content-marketing problem. Most B2B marketing teams are organized to ship blog posts; few are organized to land a TechRadar inclusion or earn unprompted Reddit mentions. The org chart will need to bend before the citation share does.

A caveat on Wikipedia: this is not an instruction to create a Wikipedia page. Wikipedia's notability guidelines for organizations require multiple independent, non-routine secondary sources by unaffiliated parties; self-promotion and paid placements explicitly do not count. Conflict-of-interest rules discourage editing pages where you have a financial relationship, and recent reporting (a January 2026 Bureau of Investigative Journalism story on consultancies using subcontractors to edit client pages) has tightened scrutiny further. The implication isn't "make a Wikipedia page" — it's "earn enough independent press that a Wikipedia entry becomes defensible on its own merit." Most vendors in this dataset are not at that bar.

Reddit is the headline of this refresh. It alone accounts for 7.9% of all citations — more than any vendor, more than any single editorial source, and now the #1 cited domain in the category (up from #3 last time). When a B2B buyer types "best AEO tool for an in-house team of 5" into ChatGPT, Perplexity, or Claude, the engine looks for lived buyer experience to anchor its answer. Reddit threads in r/SEO, r/marketing, r/SaaS, r/MarTech, and r/bigseo are routinely the most concentrated source of that experience on the open web.

Implication: Reddit visibility is now strategic, not optional, for B2B vendors whose buyers compare options before purchase. But authentic Reddit engagement is structurally different from content marketing. The subreddits AI engines preferentially cite have strong anti-self-promotion norms, mod-enforced rules, and community memory that punishes thin participation. A brand that drops product links will get burned, deleted, or banned. A brand that contributes substantively over months and earns unprompted mentions accumulates citation share. That takes 6–12 months to show measurable lift and cannot be templated — and it is the most defensible source of earned media in current AI search.

We unpack the tactical playbook for community-layer source work in Reddit and G2 community sources for AEO.

Profound, a top-ranked vendor on category mention rate (~59%) in this dataset, accounts for just 2.4% of all citations (881). The most-cited owned-media domain among vendors — Semrush — captures 2.4% (883). No vendor in this category, including very well-resourced ones, has built dominant AI mindshare on the back of their own .com.

Implication: nobody owns AI mindshare for their own category yet. The category-leader position in AI citation share is, today, empty. For a B2B founder or CMO allocating 2026 budget, this is a strategic opening — and a narrow one. The vendor that combines (1) deep owned-media coverage of long-tail buyer queries, (2) consistent trade-publication earned media, and (3) authentic community-layer participation in the subreddits and YouTube channels their buyers consume will earn outsize share. None of the 10 vendors in our adjacent buyer-guide comparison are doing all three in 2026. Source-layer matching framework: How to earn LLM citations.

Why the long tail matters

The 45.1% "Other" bucket — 16,368 citations spread across roughly 2,800 unique domains — is the single largest source category in this study. It deserves more attention than most AEO conversations give it.

A few characteristics:

This shape is consistent with published academic research on Retrieval-Augmented Generation. The 2025–2026 generation of RAG architectures (Self-RAG, Corrective RAG / CRAG, and selective-retrieval frameworks documented in current arXiv preprints) use hybrid retrieval and self-critique loops to pull from a diverse set of sources rather than fixate on a small number of high-authority domains. Long-tail dominance is the operational output of that design choice.

The strategic implication is uncomfortable for vendors thinking in terms of a few "pillar pieces" on their own site. The closer your source-layer presence resembles the underlying citation distribution — long, broad, distributed — the higher your structural fit with how AI engines retrieve.

Per-domain volume is low. The average domain in this bucket appears only a handful of times across ~3,493 responses; many appear once.
The composition is broad. Niche industry blogs, personal practitioner sites, agency case studies, podcast show-notes, Substack posts, GitHub READMEs, academic papers, archived articles, regional trade publications, free-tier review platforms, and many one-off pages.
Each individual source has near-zero leverage in isolation. A brand cannot "win" by getting cited on any single long-tail domain.
In aggregate, the long tail dominates. A brand cited on 50–100 long-tail domains will outperform one concentrated on its own .com plus three vendor-blog placements.

What this means for source-layer strategy

A practical three-layer model emerges directly from the data. We've discussed the conceptual version in owned, earned, community sources for AI; here is the version anchored to the citation distribution above.

Earned layer: the highest-leverage single action

Owned is ~1.5% of category citations — not zero, and up from under 1% as vendors published more. For deep-funnel comparison queries where the engine looks for first-party product information, owned content is the only correct source. Brands should keep investing — but with a clear-eyed view that the upper bound on owned share is roughly 1–3% even for category leaders. The job of the owned layer is to be retrievable and correct, not to dominate. Quality, not volume: a 50-page documentation site that matches the buyer query taxonomy will outperform a 500-post blog of generic marketing content.

If a B2B vendor has one marginal marketing dollar, the highest-leverage spend is editorial PR aimed at the domains that already win AI mindshare:

TechRadar (1,133 citations) — by a wide margin, the most-cited trade publication in our top 10.
Tier 2 trade press — Search Engine Land, Search Engine Journal, Marketing Brew, MarTech.org.
Analyst coverage — Gartner, Forrester, or G2-published analyst notes (when indexable).
Existing inclusions — guest essays or interviews on adjacent industry sites already in the long-tail bucket.

Community layer: Reddit alone is 7.9% of citations

"Editorial PR" in 2026 includes contributed essays and interviews, not just press releases. Press releases largely go uncited; substantive, named-author trade-publication essays do get cited.

The community-layer playbook is harder to staff and harder to fake than owned or earned. It requires (1) people who can credibly participate in technical subreddits over months without sounding like marketing, (2) willingness to be helpful in threads that don't mention your product, and (3) patience for a 6–12 month horizon to measurable citation lift. Most B2B marketing orgs are not set up for this; the rare ones that are accumulate a citation moat competitors cannot match through paid spend.

Extend community presence to G2 / Capterra / Software Advice and to YouTube (the climbing #9 in our top 10, validated by Ahrefs's December 2025 study). Operational playbook: Reddit and G2 community sources for AEO.

Caveats and what we didn't measure

We owe the reader honesty about the limits of this study.

The dataset is for the AEO category specifically. Other B2B categories — cybersecurity, legal tech, healthcare IT, fintech infrastructure — will have different distributions. We have parallel measurements in progress for five vertical categories; the structural shape (long-tail dominance, low owned share, editorial + UGC leading) is consistent, but specific top-10 domains vary materially by category. Do not generalize the top-10 table to a category we did not measure.

Thirty days is short for stability claims. Citation distributions shift week-to-week as engines update retrieval, content gets indexed, and community discussions trend. A 30-day window characterizes a current distribution; it does not claim multi-quarter stability. We re-run monthly.

The "Other" bucket needs categorization. 45.1% in a single "long tail" label is the largest analytical loose end here. The next iteration will subdivide it (independent blogs vs. agency case studies vs. academic vs. GitHub vs. archives).

The "Other" bucket needs categorization. 54.5% in a single "long tail" label is the largest analytical loose end here. The next iteration will subdivide it (independent blogs vs. agency case studies vs. academic vs. GitHub vs. archives).

We did not measure answer-shape, only source. This study counts cited URLs. It does not analyze how the engine uses sources in the answer — whether for or against the brand, with positive or negative sentiment, or whether the engine's claim is accurate to the source. Citation share is necessary but not sufficient. Sentiment and answer-shape work is on the roadmap.

Single-vendor publication. Independent replication by a non-vendor party would meaningfully strengthen the findings. We will share methodology and aggregated raw data with any academic, journalist, or analyst who wants to attempt replication.

Get the full dataset

We are making the underlying aggregated dataset available to researchers, journalists, and senior practitioners. What you can request:

Email-gated so we can notify recipients when refreshed versions publish. We do not sell or share the list. Press, analysts, and researchers get hand-delivery and follow-up support.

Request the full dataset or book a SolCrys audit to see the same data shape for your own brand.

CSV of all 2,858 cited domains with citation counts and source-type classifications.
The full 44-prompt set with exact wording.
Methodology notes — normalization rules, edge-case handling, source-type taxonomy.

About SolCrys + how to cite this study

SolCrys is an Answer Engine Optimization platform. Our product measures how AI search engines (ChatGPT, Perplexity, Google AI Overviews, Gemini, and others) cite and describe brands, diagnoses why specific queries produce specific answers, recommends source-layer fixes, and tracks the recovery once fixes are shipped. We sell to B2B mid-market and enterprise marketing teams and to agencies operating multi-client AEO portfolios. Founded in 2024, based in California.

To cite this study: SolCrys (2026). *Reddit, Wikipedia, and TechRadar Dominate AI Citations: A 36,268-Citation Study of the AEO Category.* 30-day measurement window 2026-05-12 to 2026-06-11. Available at https://solcrys.ai/resources/wikipedia-techradar-reddit-dominate-ai-citations/.

For press inquiries, dataset access, or methodology questions, contact press@solcrys.ai.

*Published 2026-05-22, refreshed 2026-06-11. Data drawn from a continuous 30-day cross-engine measurement of the AEO category prompt set (44 prompts × 5 engines over 30 days ≈ 3,493 responses, 36,268 citations across 2,858 unique domains, workspace solcysai-aeo). Next refresh: July 2026.*

FAQ

Can I replicate this study?

Yes. The methodology section above is intentionally specific enough to allow replication. You will need access to (1) the five AI engines (most are accessible through paid API tiers or web interfaces), (2) a way to programmatically capture full responses including cited URLs, (3) a tagging taxonomy for source types, and (4) the patience to run 44 prompts across 5 engines daily for 30 days. The all-in compute cost is modest; the engineering work is non-trivial. SolCrys's platform automates the data capture; we wrote it precisely because doing this manually doesn't scale. We will share our prompt set, taxonomy, and aggregated raw data on request.

What was your full prompt set?

44 buyer-comparison prompts focused on the AEO category — variations on "best AEO platform for X" with X being team type, industry, use case, or budget. We've deliberately paraphrased four examples in this article rather than publishing the literal list, because publishing the exact prompts would alter the natural distribution once AI engines begin indexing this article itself. The full list is available with the dataset request.

What about other categories?

The distribution shape (long-tail dominance, low owned share, editorial + UGC leading the top 10) holds in early parallel studies we've run for adjacent B2B categories. The specific top-10 domains vary considerably — TechRadar is dominant in tech-adjacent categories but not in legal tech; trade publications shift by industry; Reddit's importance varies by buyer demographic. We will publish category-by-category breakdowns in subsequent reports. Until then, do not assume the exact rankings here transfer to a category we have not measured.

How often will you refresh this study?

Monthly internally; quarterly published refreshes. We annotate changes so the reader can see whether the top 10 is stable, whether owned share is shifting, and whether new domains are entering the long tail. This report was refreshed on 2026-06-11 (the prior version measured 17,551 citations to 2026-05-22); the next refresh will appear in July 2026.

What tools did you use for citation extraction?

The SolCrys AEO platform's citation-extraction module. It instruments the five engines, captures full responses, extracts cited URLs (handling Perplexity's structured citations, ChatGPT's link annotations, Google AI Overviews' source carousel, Gemini's source attribution, and Claude's cited links), normalizes URLs to domains, and applies the source-type taxonomy. The same module is available to platform customers and is the data source for SolCrys's customer-facing dashboards. We did not use a separate research tool to generate this study; the production platform produced the data, which we then aggregated.

Why do Reddit and Wikipedia rank so high even though most companies cannot get a Wikipedia entry?

Reddit ranks #1 because AI engines lean on lived, first-hand buyer experience for comparison queries, and Wikipedia ranks #2 because engines lean on encyclopedic neutrality for definitions and category context. Reddit visibility is earnable (through genuine long-term community participation); a Wikipedia entry mostly is not on demand. Wikipedia's notability and conflict-of-interest rules make most vendor-driven entries inappropriate or unsustainable. The practical takeaway for a B2B vendor is not "create a Wikipedia page" but "earn enough independent third-party press and authentic community presence that citations follow."

Are you publishing this because you're a vendor and want to look smart?

Partially yes; that disclosure is in the methodology. We're also publishing it because the AEO category needs more publicly disclosed citation data, and because a study that ranks SolCrys 11 of 17 on category mention rate is harder to dismiss as marketing puffery than a study where the publisher comes out on top. Use it accordingly.

Related guides

Citation & Source Influence

AI Cites Consensus, Not Authority: Why Domain Authority Is the Wrong Target

AI engines don't cite your most authoritative page. They repeat the claim corroborated across the most independent sources. The evidence, our own category data, and what to do instead.

Citation & Source Influence

How AI Answer Engines Choose Sources: The 7 Signals We've Mapped

AI engines like ChatGPT, Perplexity, Google AI Overviews, and Claude choose sources using overlapping but distinct signals. This guide maps the 7 signals that drive citation eligibility and the engine-specific weighting differences.

Citation & Source Influence

Owned, Earned, and Community Sources in AI Answers: A 3-Layer Strategy

AI engines cite three source layers — owned (your site), earned (PR/editorial), and community (Reddit/G2/forums). In our own data, owned is only ~1.6% of citations yet still a top-10 source. Why third parties get you into the answer, why your own site still matters, and how to balance the three.

Citation & Source Influence

Turn AI answer gaps into governed marketing execution.

Start free with a ChatGPT visibility read, then add multi-engine tracking, Corporate Context governance, and the action-to-result loop when you are ready.

Start Free

Reddit, Wikipedia, and TechRadar Dominate AI Citations: A 36,268-Citation Study of the AEO Category

Questions this guide answers

Direct answer

Methodology

Workspace and time window

Prompt set

Engines

Citation extraction

What this study can and cannot claim

The top 10 cited domains

Distribution by source type

Three findings that matter for B2B marketers

Finding 3: Even the most-cited AEO vendor only captures 2.4% of category citations

Why the long tail matters

What this means for source-layer strategy

Earned layer: the highest-leverage single action

Community layer: Reddit alone is 7.9% of citations

Caveats and what we didn't measure

Get the full dataset

About SolCrys + how to cite this study

FAQ

Can I replicate this study?

What was your full prompt set?

What about other categories?

How often will you refresh this study?

What tools did you use for citation extraction?

Why do Reddit and Wikipedia rank so high even though most companies cannot get a Wikipedia entry?

Are you publishing this because you're a vendor and want to look smart?

Related guides

AI Cites Consensus, Not Authority: Why Domain Authority Is the Wrong Target

How AI Answer Engines Choose Sources: The 7 Signals We've Mapped

Owned, Earned, and Community Sources in AI Answers: A 3-Layer Strategy

Reddit, G2, and Forums: How to Win the Community Source Layer for AI Citations

How to Earn LLM Citations Without Becoming Spam: The Content-Source Match Framework

Citation Gap Audit

Turn AI answer gaps into governed marketing execution.