AirOps AI Search Playbook — what it says and what it misses

Featured study · Added May 2026

What AirOps did

Source: AirOps · Author: Josh Spilker · Published: 2 March 2026 · Method: ~15M data points across AI answers, queries, citations and brand mentions; 12,000+ pages analysed for structural elements; 21,000+ brands for third-party citation behaviour; 5.5M answers for community citation patterns

AirOps is a US content-operations platform. They’ve spent the last twelve months running citation analysis at scale and turned a portion of it into a 17-page marketing report — The Complete AI Search Playbook for Marketers. It’s the practitioner-facing companion to The 2026 State of AI Search, a larger dataset they developed with Kevin Indig.

Headline findings

→70% of AI-cited pages were updated within the past year. Content less than three months old is 3× more likely to be cited.
→Pages with 3+ schema types are 13% more likely to be cited. Clear H1→H2→H3 hierarchy gives 2.8× higher citation odds. Lists and tables appear in nearly 80% of ChatGPT citations vs 29% in Google’s top results.
→85% of brand mentions come from third-party sources, not the brand’s own site. Brands are 6.5× more likely to be cited via third-party sources than their own domain.
→68% of brand mentions are unique to a single AI model. Cross-model consistency is rare — the strongest external case yet for measuring across all the major LLMs, not one.
→~90% of third-party citations come from listicles, comparisons and review sites. 80% of cited brands appear in the top three of those formats.
→48% of AI citations come from Reddit, LinkedIn and YouTube combined. Reddit appears in ~22% of AI-generated answers. 75% of YouTube citations sit in non-branded “how-to” queries.

One thing to note: every featured case study in the playbook — Carta, Webflow, Chime, Docebo, Klaviyo, LegalZoom — is an AirOps customer. The 7× citation lifts and 6× conversion uplifts are real, but they’re testimonials, not independent benchmarks.

What it builds on

Most of what’s in the AirOps Playbook is consistent with what we and other practitioners have been seeing for the last twelve months. Visibility is a citation problem, not a ranking problem. Freshness matters. Owned content alone won’t get you cited — you need third-party signal. Cross-model variation is real and stable.

The strongest contributions on top of that consensus are the granular structural numbers. The 12,000-page analysis gives the field its cleanest published figures yet on FAQ presence, schema-type counts, heading hierarchy and lists/tables. We were already recommending most of these as part of AVS Plan-stage work; the AirOps numbers turn “do this” into “do this — here’s the citation lift behind it”.

The 68% single-model finding is also a meaningful contribution. It’s the strongest external evidence yet for measuring across all the major LLMs rather than ChatGPT-only — which is what we already do, and what most “GEO dashboards” don’t.

This sits alongside Seer’s GEO Olympics Study (231,347 LLM responses, six models, every Olympic category), which we wrote up earlier this year. Together they’re the two most rigorous published GEO studies the field has so far. Both are worth real time.

What we question

Three things, in priority order.

1. The per-brand maths.

AirOps’s headline frame is ~15 million data points across 21,000 brands. Spread evenly that’s roughly 700 prompts per brand. As industry-wide field reading that’s plenty. As a measurement of any specific brand, it’s thin — and crucially, it’s generic. Every brand in the dataset gets asked the same kind of questions in the same way. Whether any particular question matches what a buyer in your category would actually type is luck of the draw.

By contrast, AVS Annual measures 6,000+ prompts per brand, with every prompt designed for that brand’s sector, buyers, and the specific decisions those buyers make. That’s roughly 8× the per-brand depth of the AirOps average — and bespoke rather than generic. We’re not picking a fight with AirOps’s overall sample size. We’re saying: a 700-prompt slice of a 21,000-brand average is not the same thing as a 6,000-prompt picture of your business.

And the prompts are only half of it. The product K&C sells is the judgement on top of the data — what the AI conversation about your brand actually means, which findings matter for your buyers, which battles to pick first. Generic data without that layer is just numbers. AirOps’s 21,000-brand average is good field reading; it isn’t a recommendation for your business.

“Whilst key generic learnings are great, nothing will help you more than something that’s tailored to fit your exact business, your exact audience, and your exact goals.”

2. Sample bias on the case studies.

Every featured case study in the playbook is an AirOps customer. Carta, Webflow, Chime, Docebo, Klaviyo, LegalZoom. The 7× citation lift, 6× organic conversion, 5× refresh velocity numbers are real — but they’re customer testimonials, not independent benchmarks. When citing them, attribute clearly: “AirOps reports their customer Webflow saw...”, not “Webflow saw...”. The numbers are still indicative; the framing matters.

3. Zero methodology disclosure.

Across 12,000 pages, 5.5M answers, 21,000 brands and 15M data points, the report says nothing about which queries were used, how they were phrased, what voice the prompts had, what models or versions were tested, or which countries and languages were covered. For research-quality citation that would fail peer review. For marketing-grade citation it’s standard — but the gap is the gap. K&C publishes its prompt design, model coverage and country/language scope on every measurement. It costs us a small competitive moat. It buys us the right to call this out as an industry-wide problem worth fixing.

What this means for you

Four practical things a K&C client should take from this study.

a. Audit your schema markup.

Pages with three or more schema types are 13% more likely to be cited. Most B2B websites we audit run one type — usually Organization — and stop. Adding Article on every blog post, FAQPage on every FAQ block, Service on each service page, BreadcrumbList on every page in the hierarchy, and Person on the about page is a half-day of work and probably the cheapest GEO win available right now.

We’ve audited our own site against this finding — see the K&C schema audit alongside this page.

b. Measure across all the major models, not one.

68% of brand mentions in AI search appear in only one model. If you’re tracking ChatGPT only — which most cheap GEO dashboards do — you’re seeing about a third of your visibility picture. Measure across ChatGPT, Claude, Gemini and Perplexity at minimum. AVS does. Almost nothing else does at this depth.

c. Find your category’s top-three comparison pages — and check whether you’re on them.

90% of third-party AI citations come from listicles, comparison pages and review sites; 80% of cited brands sit in the top three of those formats. If you’re a UK B2B brand and you’re not in the top three on the key comparison page in your category, you’re effectively invisible in AI for that query.

d. Treat freshness as quarterly, not annual.

70% of cited pages were updated in the past year. 60%+ in the past six months for ChatGPT commercial queries. For SaaS, finance and news the AirOps team flag the window as tighter still. If your content calendar assumes an annual refresh cycle, you’re a year too late.

Where K&C goes deeper

K&C and AirOps occupy the same broad space — AI visibility — but with different products and different buyers. AirOps is a content-operations platform: produce content faster, refresh it more often, and use their software to monitor where you stand. The data behind their report is real and the platform is well-built.

Where K&C goes deeper is two things. First, the depth: AVS isn’t a dashboard. It’s a structured, sector-tailored measurement programme, built on 6,000+ prompts per brand at full Annual depth, run across the major LLMs, with every prompt, model and geography disclosed. Roughly 8× the per-brand depth of the AirOps average — and tailored, not generic.

Second, and more important: the product is the judgement, not the data. Anyone with a budget can buy AI search data now. The bit you can’t buy off a shelf is somebody who knows what to do with it — what the AI conversation about your brand actually means, which findings matter for your buyers, which battles to pick this quarter and which to leave. That layer is what every AVS engagement is built around. The numbers are the means. The judgement is the deliverable.

Generic GEO benchmarks tell you which way the field is moving. They don’t tell you whether the buyers in your sector look anything like the brands the report measured, and they certainly don’t tell you what to do next. That’s the bit AVS is built to do.

If you want to know how your business actually surfaces in AI search across multiple models and a tailored prompt set built for your buyers — with someone who can tell you what it means and what to do about it — book an AVS Exec Brief. Single engagement, free, no commitment.