Citation & Source Influence
Wikipedia, TechRadar, and Reddit Dominate AI Citations: A 17,551-Citation Study of the AEO Category
Across 17,551 citations drawn from 22 buyer-comparison prompts, 4 AI engines (ChatGPT, Perplexity, Google AI Overviews, Gemini), and 30 days of continuous measurement in the Answer Engine Optimization (AEO) category, **Wikipedia (978 citations), TechRadar (908), a
Updated 2026-05-22
Questions this guide answers
- What sources do AI search engines cite most?
- Why does ChatGPT cite Wikipedia?
- How important is Reddit for AI search?
- What domains do AI engines pull from?
- How can I get cited by AI search?
Direct answer
Across 17,551 citations drawn from 22 buyer-comparison prompts, 4 AI engines (ChatGPT, Perplexity, Google AI Overviews, Gemini), and 30 days of continuous measurement in the Answer Engine Optimization (AEO) category, Wikipedia (978 citations), TechRadar (908), and Reddit (785) are the three most-cited domains — together accounting for 15.3% of all citations. Vendor-owned content — the .com sites of every brand competing in this category combined — accounts for only 0.85% of citations. This includes SolCrys, the publisher of this study, which is a vendor in the AEO category and ranks 7 of 7 on category mention rate at 4.82%. The top-cited vendor in the dataset is Profound at 2.1% of category citations. The pattern is clear: in this category, AI engines preferentially cite third-party editorial, community, and long-tail sources, not the vendors' own websites.
Methodology
This is a self-published study by SolCrys, an AEO platform vendor. We're disclosing methodology in full so any researcher — or competitor — can replicate or contest the findings.
| Engine | Vendor | Surface tested |
|---|---|---|
| ChatGPT | OpenAI | Web-search-enabled GPT response |
| Perplexity | Perplexity AI | Default search response |
| Google AI Overviews / AI Mode | AI-generated answer panel | |
| Gemini | Standalone Gemini response with web grounding |
Workspace and time window
- Workspace: `solcysai-aeo` (a continuous monitoring workspace SolCrys runs on its own category)
- Window: 2026-04-22 to 2026-05-22 (30 consecutive days)
- Tool: SolCrys AEO platform, citation-extraction module
Prompt set
We used a fixed set of 22 buyer-comparison prompts designed to reflect the queries a B2B marketer, agency lead, or VP-Marketing would type into an AI engine while shortlisting AEO tools. The full prompt set is available on request; representative examples:
- "What are the best AEO tools for B2B marketing teams in 2026?"
- "Best AEO platforms for agencies managing multiple clients"
- "Best tools to track brand visibility in ChatGPT"
- "Top generative engine optimization platforms compared"
Engines
These are paraphrases. The actual prompts vary in punctuation and qualifier wording — a deliberate choice, since AI engines respond differently to small wording shifts and a single-phrasing study would over-fit.
Four engines, queried daily:
Citation extraction
22 prompts × 4 engines × 22 daily runs over 30 days = 1,936 total responses analyzed.
For every one of the 1,936 responses, we extracted every URL the engine cited or linked as a source. Each URL was reduced to its registrable domain (e.g., `https://blog.hubspot.com/foo` → `hubspot.com`) and counted. Domains were classified into five source types:
- Owned: the .com (or product subdomain) of any brand competing in the AEO category — these are vendors citing themselves.
- Competitor: the .com of a vendor in an adjacent or overlapping category (e.g., HubSpot, Semrush, Ahrefs).
- Editorial: trade press, news, analyst sites (TechRadar, Search Engine Land, Marketing Brew, vendor-run editorial blogs treated as competitor only when the vendor is in-category).
- UGC: Reddit, YouTube, Quora, forums, community wikis.
- Other: everything else — long-tail blogs, personal websites, academic pages, GitHub, niche industry sites, etc.
What this study can and cannot claim
Totals: 17,551 citations, 5,498 unique URLs, 2,219 unique cited domains.
SolCrys (publisher of this study) is a vendor in the AEO category. In this dataset, SolCrys ranks 7 of 7 on cross-engine mention rate at 4.82%. The top-cited vendor in the dataset is Profound at 2.1% of category citations (361 citations). We have no commercial incentive to inflate Profound's number or deflate Wikipedia's; the methodology counts every cited URL regardless of source. If we wanted to bias this study to flatter SolCrys, we would not have published it — we are tied for last in our own category measurement.
This study describes citation patterns for the AEO category specifically, during a 30-day window, using a 22-prompt sample. It is sufficient to characterize the source-layer distribution AI engines pull from when answering buyer questions about AEO platforms. It is not sufficient to claim that the same distribution holds for unrelated categories (cybersecurity, legal tech, consumer electronics may look very different — we have parallel studies in progress), nor that it is stable across longer horizons. Caveats are spelled out in detail later in this report.
The top 10 cited domains
Three observations from this table:
Editorial and UGC dominate the top three. Wikipedia and TechRadar combined account for 1,886 citations — more than the next seven vendor-owned sites combined. Reddit adds another 785. The implication is that for buyer-comparison queries in this category, AI engines lean on neutral encyclopedic content (Wikipedia), curated trade-press roundups (TechRadar), and authentic community discussion (Reddit) before they reach for any single vendor's marketing site.
Competitor-vendor content is the long tail of the top 10. Positions 4 through 10 are mostly vendor blogs — HubSpot, Profound, Semrush, Conductor, SE Ranking, AirOps. Each one earns between 1% and 2.2% of citations individually. No single vendor has built a dominant owned-media presence in AI answers; the leader, HubSpot, captures 2.2% — and HubSpot isn't even a pure-play AEO vendor (its citations come largely from adjacent marketing and SEO content).
YouTube is in the top 10 — and it is climbing. Otterly's 2026 YouTube Citation Study identifies YouTube as the #2 social platform for AI citations behind Reddit; Ahrefs' December 2025 75,000-brand study reports YouTube mentions show ~0.737 correlation with AI visibility across ChatGPT, AI Mode, and AI Overviews. Our #9 ranking for YouTube is consistent with both.
| Rank | Domain | Citations | Type | % of total |
|---|---|---|---|---|
| 1 | wikipedia.org | 978 | Editorial | 5.6% |
| 2 | techradar.com | 908 | Editorial | 5.2% |
| 3 | reddit.com | 785 | UGC | 4.5% |
| 4 | hubspot.com | 380 | Competitor | 2.2% |
| 5 | tryprofound.com | 361 | Competitor | 2.1% |
| 6 | semrush.com | 338 | Competitor | 1.9% |
| 7 | conductor.com | 277 | Competitor | 1.6% |
| 8 | seranking.com | 256 | Competitor | 1.5% |
| 9 | youtube.com | 246 | UGC | 1.4% |
| 10 | airops.com | 187 | Competitor | 1.1% |
Distribution by source type
This is the chart that should change how a B2B marketer thinks about content strategy.
The long tail is the largest single bucket. More than half of all citations — 54.5% — point to domains that did not crack the top 10, top 50, or in most cases the top 100. These are blogs, personal sites, niche industry pages, academic content, GitHub repositories, agency case-study pages, and thousands of other small destinations each contributing a handful of citations. Spread across roughly 2,000 distinct domains, this bucket averages about 4–5 citations per domain. No single one of them matters; collectively, they're the majority of AI answer source material.
Competitor vendor blogs collectively beat editorial. The 17.6% competitor share is notable because each individual vendor in this bucket only owns 1–2% — but added together, vendor blogs are the second-largest source type. AI engines do read marketing content; they just don't weight any single piece of it heavily.
Owned media is functionally negligible. Across every brand in the category, including their own websites, just 0.85% of citations are owned. This is the most important data point in the study for senior marketing leaders, because it directly contradicts the still-dominant assumption that publishing more to your own blog is how you "win AI search." It isn't. Not in this category. Not at this volume.
This finding aligns directionally with two independent studies. OtterlyAI's "AI Citation Economy" report (1M+ data points across ChatGPT, Perplexity, and Google AI Overviews) finds AI engines depend ~95% on third-party sources, with brands receiving frequent mentions but weak link citations. Separate PR-trade reporting cites 94% of AI citations coming from earned media — both converge on the same structural pattern as our 99.15% non-owned share.
| Source type | Citations | % of total |
|---|---|---|
| Other (long-tail) | 9,560 | 54.5% |
| Competitor (vendor blogs) | 3,085 | 17.6% |
| Editorial (trade press, news) | 2,052 | 11.7% |
| UGC (Reddit, YouTube, forums) | 1,301 | 7.4% |
| Owned (the brand's own .com) | 150 | 0.85% |
Three findings that matter for B2B marketers
Finding 3: Even the most-cited AEO vendor only captures 2.1% of category citations
Wikipedia (978 citations) and TechRadar (908) each individually pull more citations than HubSpot (380) — and HubSpot is the most-cited vendor blog in the category. The Wikipedia-to-top-vendor ratio is roughly 5:1; the TechRadar-to-top-vendor ratio is roughly 4.7:1. Against pure-play AEO vendors (Profound, Semrush AEO, Conductor's AEO module), the ratio widens to 7:1 to 10:1.
Implication: owned media is necessary but structurally insufficient. The highest-leverage incremental dollar in an AEO content budget is earned media — getting your brand named, quoted, or linked in editorial coverage that already wins AI mindshare. That's a PR and analyst-relations problem, not a content-marketing problem. Most B2B marketing teams are organized to ship blog posts; few are organized to land a TechRadar inclusion. The org chart will need to bend before the citation share does.
A caveat on Wikipedia: this is not an instruction to create a Wikipedia page. Wikipedia's notability guidelines for organizations require multiple independent, non-routine secondary sources by unaffiliated parties; self-promotion and paid placements explicitly do not count. Conflict-of-interest rules discourage editing pages where you have a financial relationship, and recent reporting (a January 2026 Bureau of Investigative Journalism story on consultancies using subcontractors to edit client pages) has tightened scrutiny further. The implication isn't "make a Wikipedia page" — it's "earn enough independent press that a Wikipedia entry becomes defensible on its own merit." Most vendors in this dataset are not at that bar.
This is the finding that will be uncomfortable for B2B brands accustomed to thinking of Reddit as a community for consumer products. In this dataset, Reddit alone accounts for 4.5% of all citations — more than HubSpot, more than every single AEO vendor's owned content, and approaching the combined total of the #4 and #5 ranked domains.
The reason is structural. When a B2B buyer types "best AEO tool for an in-house team of 5" into ChatGPT or Perplexity, the engine looks for lived buyer experience to anchor its answer. Reddit threads in r/SEO, r/marketing, r/SaaS, r/MarTech, and r/bigseo are routinely the most concentrated source of that experience on the open web. The Otterly study identifies the same pattern: ChatGPT in particular favors Reddit, Wikipedia, and news sites — community sources outweigh brand sources at the citation layer even when brands have higher mention rates in answer text.
Implication: Reddit visibility is now strategic, not optional, for B2B vendors whose buyers compare options before purchase. But authentic Reddit engagement is structurally different from content marketing. The subreddits AI engines preferentially cite have strong anti-self-promotion norms, mod-enforced rules, and community memory that punishes thin participation. A brand that drops product links will get burned, deleted, or banned. A brand that contributes substantively over months and earns unprompted mentions from other community members will accumulate citation share. The latter takes 6–12 months to show measurable lift and cannot be templated — and it is the most defensible source of earned media in current AI search.
We unpack the tactical playbook for community-layer source work in Reddit and G2 community sources for AEO.
Profound, the top-ranked vendor on category mention rate (48.85%) in this dataset, accounts for just 2.1% of all citations in the category. The top owned-media domain across the entire study — HubSpot, an adjacent marketing-tech major — captures 2.2%. No vendor in this category, including very well-resourced ones, has built dominant AI mindshare on the back of their own .com.
Implication: nobody owns AI mindshare for their own category yet. The category-leader position in AI citation share is, today, empty. For a B2B founder or CMO allocating 2026 budget, this is a strategic opening — and a narrow one. The vendor that combines (1) deep owned-media coverage of long-tail buyer queries, (2) consistent trade-publication earned media, and (3) authentic community-layer participation in the subreddits and YouTube channels their buyers consume will earn outsize share. None of the 10 vendors in our adjacent buyer-guide comparison are doing all three in 2026. Source-layer matching framework: How to earn LLM citations.
Why the long tail matters
The 54.5% "Other" bucket — 9,560 citations spread across roughly 2,000 unique domains — is the single largest source category in this study. It deserves more attention than most AEO conversations give it.
A few characteristics:
This shape is consistent with published academic research on Retrieval-Augmented Generation. The 2025–2026 generation of RAG architectures (Self-RAG, Corrective RAG / CRAG, and selective-retrieval frameworks documented in current arXiv preprints) use hybrid retrieval and self-critique loops to pull from a diverse set of sources rather than fixate on a small number of high-authority domains. Long-tail dominance is the operational output of that design choice.
The strategic implication is uncomfortable for vendors thinking in terms of a few "pillar pieces" on their own site. The closer your source-layer presence resembles the underlying citation distribution — long, broad, distributed — the higher your structural fit with how AI engines retrieve.
- Per-domain volume is low. The average domain in this bucket appears 4–5 times across 1,936 responses; many appear once.
- The composition is broad. Niche industry blogs, personal practitioner sites, agency case studies, podcast show-notes, Substack posts, GitHub READMEs, academic papers, archived articles, regional trade publications, free-tier review platforms, and many one-off pages.
- Each individual source has near-zero leverage in isolation. A brand cannot "win" by getting cited on any single long-tail domain.
- In aggregate, the long tail dominates. A brand cited on 50–100 long-tail domains will outperform one concentrated on its own .com plus three vendor-blog placements.
What this means for source-layer strategy
A practical three-layer model emerges directly from the data. We've discussed the conceptual version in owned, earned, community sources for AI; here is the version anchored to the citation distribution above.
Earned layer: the highest-leverage single action
Owned is 0.85% of category citations — not zero. For deep-funnel comparison queries where the engine looks for first-party product information, owned content is the only correct source. Brands should keep investing — but with a clear-eyed view that the upper bound on owned share is roughly 1–3% even for category leaders. The job of the owned layer is to be retrievable and correct, not to dominate. Quality, not volume: a 50-page documentation site that matches the buyer query taxonomy will outperform a 500-post blog of generic marketing content.
If a B2B vendor has one marginal marketing dollar, the highest-leverage spend is editorial PR aimed at the domains that already win AI mindshare:
- TechRadar (908 citations) — by a wide margin, the most-cited trade publication in our top 10.
- Tier 2 trade press — Search Engine Land, Search Engine Journal, Marketing Brew, MarTech.org.
- Analyst coverage — Gartner, Forrester, or G2-published analyst notes (when indexable).
- Existing inclusions — guest essays or interviews on adjacent industry sites already in the long-tail bucket.
Community layer: Reddit alone is 4.5% of citations
"Editorial PR" in 2026 includes contributed essays and interviews, not just press releases. Press releases largely go uncited; substantive, named-author trade-publication essays do get cited.
The community-layer playbook is harder to staff and harder to fake than owned or earned. It requires (1) people who can credibly participate in technical subreddits over months without sounding like marketing, (2) willingness to be helpful in threads that don't mention your product, and (3) patience for a 6–12 month horizon to measurable citation lift. Most B2B marketing orgs are not set up for this; the rare ones that are accumulate a citation moat competitors cannot match through paid spend.
Extend community presence to G2 / Capterra / Software Advice and to YouTube (the climbing #9 in our top 10, validated by Ahrefs's December 2025 study). Operational playbook: Reddit and G2 community sources for AEO.
Caveats and what we didn't measure
We owe the reader honesty about the limits of this study.
The dataset is for the AEO category specifically. Other B2B categories — cybersecurity, legal tech, healthcare IT, fintech infrastructure — will have different distributions. We have parallel measurements in progress for five vertical categories; the structural shape (long-tail dominance, low owned share, editorial + UGC leading) is consistent, but specific top-10 domains vary materially by category. Do not generalize the top-10 table to a category we did not measure.
Thirty days is short for stability claims. Citation distributions shift week-to-week as engines update retrieval, content gets indexed, and community discussions trend. A 30-day window characterizes a current distribution; it does not claim multi-quarter stability. We re-run monthly.
We did not measure whether citations drive traffic. A citation is not a click. Some are impression-only; others drive meaningful referral. The conversion from citation to click is a separate research question we have not quantified, and we caution against any vendor (including SolCrys) claiming a fixed citation-to-traffic ratio in 2026.
The "Other" bucket needs categorization. 54.5% in a single "long tail" label is the largest analytical loose end here. The next iteration will subdivide it (independent blogs vs. agency case studies vs. academic vs. GitHub vs. archives).
We did not measure answer-shape, only source. This study counts cited URLs. It does not analyze how the engine uses sources in the answer — whether for or against the brand, with positive or negative sentiment, or whether the engine's claim is accurate to the source. Citation share is necessary but not sufficient. Sentiment and answer-shape work is on the roadmap.
Single-vendor publication. Independent replication by a non-vendor party would meaningfully strengthen the findings. We will share methodology and aggregated raw data with any academic, journalist, or analyst who wants to attempt replication.
Get the full dataset
We are making the underlying aggregated dataset available to researchers, journalists, and senior practitioners. What you can request:
Email-gated so we can notify recipients when refreshed versions publish. We do not sell or share the list. Press, analysts, and researchers get hand-delivery and follow-up support.
Request the full dataset or book a SolCrys audit to see the same data shape for your own brand.
- CSV of all 2,219 cited domains with citation counts and source-type classifications.
- The full 22-prompt set with exact wording.
- Methodology notes — normalization rules, edge-case handling, source-type taxonomy.
About SolCrys + how to cite this study
SolCrys is an Answer Engine Optimization platform. Our product measures how AI search engines (ChatGPT, Perplexity, Google AI Overviews, Gemini, and others) cite and describe brands, diagnoses why specific queries produce specific answers, recommends source-layer fixes, and tracks the recovery once fixes are shipped. We sell to B2B mid-market and enterprise marketing teams and to agencies operating multi-client AEO portfolios. Founded in 2024, based in California.
To cite this study: SolCrys (2026). *Wikipedia, TechRadar, and Reddit Dominate AI Citations: A 17,551-Citation Study of the AEO Category.* 30-day measurement window 2026-04-22 to 2026-05-22. Available at https://solcrys.ai/resources/wikipedia-techradar-reddit-dominate-ai-citations/.
For press inquiries, dataset access, or methodology questions, contact press@solcrys.ai.
*Published 2026-05-22. Data drawn from a continuous 30-day cross-engine measurement of the AEO category prompt set (22 prompts × 4 engines × 22 daily runs = 1,936 responses, 17,551 citations across 2,219 unique domains, workspace solcysai-aeo). Next refresh: August 2026.*
FAQ
Can I replicate this study?
Yes. The methodology section above is intentionally specific enough to allow replication. You will need access to (1) the four AI engines (most are accessible through paid API tiers or web interfaces), (2) a way to programmatically capture full responses including cited URLs, (3) a tagging taxonomy for source types, and (4) the patience to run 22 prompts × 4 engines daily for 30 days. The all-in compute cost is modest; the engineering work is non-trivial. SolCrys's platform automates the data capture; we wrote it precisely because doing this manually doesn't scale. We will share our prompt set, taxonomy, and aggregated raw data on request.
What was your full prompt set?
22 buyer-comparison prompts focused on the AEO category — variations on "best AEO platform for X" with X being team type, industry, use case, or budget. We've deliberately paraphrased four examples in this article rather than publishing the literal list, because publishing the exact prompts would alter the natural distribution once AI engines begin indexing this article itself. The full list is available with the dataset request.
What about other categories?
The distribution shape (long-tail dominance, low owned share, editorial + UGC leading the top 10) holds in early parallel studies we've run for adjacent B2B categories. The specific top-10 domains vary considerably — TechRadar is dominant in tech-adjacent categories but not in legal tech; trade publications shift by industry; Reddit's importance varies by buyer demographic. We will publish category-by-category breakdowns in subsequent reports. Until then, do not assume the exact rankings here transfer to a category we have not measured.
How often will you refresh this study?
Monthly internally; quarterly published refreshes. We'll annotate quarter-over-quarter changes so the reader can see whether the top 10 is stable, whether owned share is shifting, and whether new domains are entering the long tail. The first refresh will appear in August 2026.
What tools did you use for citation extraction?
The SolCrys AEO platform's citation-extraction module. It instruments the four engines, captures full responses, extracts cited URLs (handling Perplexity's structured citations, ChatGPT's link annotations, Google AI Overviews' source carousel, and Gemini's source attribution), normalizes URLs to domains, and applies the source-type taxonomy. The same module is available to platform customers and is the data source for SolCrys's customer-facing dashboards. We did not use a separate research tool to generate this study; the production platform produced the data, which we then aggregated.
Why does Wikipedia rank #1 even though most companies cannot get a Wikipedia entry?
This is the right question, and it's why we wrote a separate caveat above. Wikipedia's #1 ranking reflects how AI engines retrieve — they lean on encyclopedic neutrality for definitions and category-context queries. It does not imply that every vendor should attempt a Wikipedia page; Wikipedia's notability and conflict-of-interest rules make most vendor-driven entries inappropriate or unsustainable. The practical takeaway for a B2B vendor is not "create a Wikipedia page" but "earn enough independent third-party press that the underlying notability case is defensible — and let a Wikipedia entry follow if and when it earns itself."
Are you publishing this because you're a vendor and want to look smart?
Partially yes; that disclosure is in the methodology. We're also publishing it because the AEO category needs more publicly disclosed citation data, and because a study that ranks SolCrys 7 of 7 on category mention rate is harder to dismiss as marketing puffery than a study where the publisher comes out on top. Use it accordingly. ---
Related guides
Citation & Source Influence
How AI Answer Engines Choose Sources: The 7 Signals We've Mapped
AI engines like ChatGPT, Perplexity, Google AI Overviews, and Claude choose sources using overlapping but distinct signals. This guide maps the 7 signals that drive citation eligibility and the engine-specific weighting differences.
Citation & Source Influence
Owned, Earned, and Community Sources in AI Answers: A 3-Layer Strategy
AI engines cite three distinct source layers — owned (your site), earned (PR/editorial), and community (Reddit/G2/forums). This guide explains how to balance investment by category and life stage.
Citation & Source Influence
Reddit, G2, and Forums: How to Win the Community Source Layer for AI Citations
AI engines cite Reddit, G2, and niche forums disproportionately when answering buyer prompts. This guide is the practitioner playbook for earning community citations without becoming spam — with the 7 rules of native engagement.
Free AI visibility audit
Find out where your brand is missing, miscited, or misrepresented.
SolCrys maps high-intent prompts to mentions, citations, answer accuracy, and content gaps so your team can prioritize the next pages to ship.