Don't use embeddings for audience research; use reasoning models instead, especially at scale

If you do audience research seriously, you already know that one query is not the same as another, even when they share every single keyword.

The hard part isn't knowing this, I assume you already do, but doing it at scale. Which is what I did. And I learned something I want to share.

This article is about the technical choice underneath that scaling problem, and why the obvious answer is the wrong one. Because embeddings might not always be all you need ;)

What embeddings do, and why we use them #

Embeddings might just be the most powerful tools in modern NLP. They translate words, sentences, or whole texts into numerical vectors that capture whether two pieces of text are about the same subject. For topic clustering, taxonomy matching, semantic search: often exactly the right call. It's awesome, and useful.

Their power lies in something specific: they're trained to treat paraphrases as equivalent. "How do I make pasta?" and "Spaghetti preparation instructions" sit close to each other in vector space, even though they're phrased totally differently. That's exactly what you want for search: users should find answers regardless of how they phrase their question.

But that same property makes embeddings unsuitable for audience research at scale. And that IS something you definitely should be aware of.

The difference between a topic and an emotional state #

Audience research is not topic detection. It doesn't ask "what is this question about", it asks "what emotional state is the person in who's asking it".

That's a different signal, and it doesn't live in the content of the question, but in the form.

Look at this sequence:

  1. 1.

    What is account-based marketing?

  2. 2.

    How does account-based marketing work in practice?

  3. 3.

    Account-based marketing vs inbound marketing

  4. 4.

    What if account-based marketing doesn't work for SMBs?

  5. 5.

    Compare account-based marketing software

  6. 6.

    Best account-based marketing tool for 2026

To an embedding model: six questions about one topic. High mutual similarity. Same cluster.

To a human reading them one by one: six different points in a consumer journey, possibly even for a single person. Might be in different points in time, or maybe even 15 minutes apart. The first is orienting. The second has the concept but wants it applied. The third is comparing strategies. The fourth feels doubt about fit. The fifth is further along, looking at concrete tooling. The sixth is close to a decision.

That's an audience research goldmine that embedding clustering flattens completely.

Google itself classifies queries this way #

Google itself doesn't treat all queries equally. In late 2024, Mark Williams-Cook and his team discovered an endpoint vulnerability that revealed Google assigns nearly every query to one of eight "Refined Query Semantic Classes" (RQ-classes): SHORT_FACT, OTHER, COMPARISON, CONSEQUENCE, REASON, DEFINITION, INSTRUCTION, BOOL1.

This classification helps Google decide which SERP features to surface and how to weigh results. The classes are derived not from topic, but from question form: the structure of how the question is asked.

Google is doing exactly what audience research needs to do: read form, not just meaning. They have to, because their job is matching results to query type, not just to topic. A bool for example, shouldn't result in a map or a knowledge panel, but in a simple yes or no. Simple example of course, but you get my point.

The issue for us SEOs though is that Google's eight classes are mechanistic. They tell you what kind of question is being asked, but not what state the asker is in. A "Comparative" query can come from someone idly browsing alternatives or from someone in a panicked end-of-quarter decision. The form tells you they're comparing; it doesn't tell you what's driving them.

So for audience research, you need both layers: the mechanical (what kind of question) and the affective (what mental state). The mechanical layer is doable with relatively simple rules. The affective layer is where things get interesting.

What form reveals about state #

The form of a question carries a surprising amount of signal beyond Google's eight classes. Some concrete patterns that show up in every market (please note these are just examples):

Question word as phase indicator

What is orientation, acquiring basic knowledge
How does... work understanding, wanting to grasp mechanisms
Why depth, wanting cause-and-effect
Which choice, weighing options
When context, timing or suitability
How much does... cost decision approaching

Conditional structures signal doubt or fear

What if I choose
Suppose it doesn't work
Can I switch later if
Is it possible to

These aren't just phrasings; they're linguistic signals of someone weighing a possible negative scenario. That's fear, or at least risk-aversion. A question starting with "what if" reveals something a "what is" question doesn't, regardless of whether they're about the same topic.

Modality reveals certainty and urgency

I need to urgency, decision under pressure
I want to motivation, directedness
I'm torn between explicit choice phase
I'm looking for active search state

Social references signal a need for validation

Which CRM do Fortune 500 companies use?
What is the competition doing about
Industry standard for

Someone asking this way isn't looking for a factual answer. They're looking for social validation, a reference point against which to measure their own choice. That's a different buying phase than someone typing "best CRM".

Emotional charge in word choice

Honest review of healthy skepticism, looking for proof
Does X really work for doubt, wants to be convinced
Best solution for optimism, focused on progress
Avoid failure with fear, focused on risk-avoidance

The same user, on the same day, on the same topic, can use any of these phrasings depending on their mental state at the moment. An audience research method that ignores this nuance misses exactly where strategic value lives.

This is established science. And also, I am not the scientist here #

This may sound subjective, but it isn't. The link between linguistic form and mental state is one of the better-studied things in psycholinguistics.

Carol Kuhlthau's Information Search Process model maps directly onto what audience research needs to detect2. It identifies six stages of information seeking (initiation, selection, exploration, formulation, collection, presentation), and each stage has its own cognitive, affective, and physical signature. Early stages carry uncertainty and apprehension; later stages carry confidence and direction. The signal is in the language used at each stage. The model emerged from Kuhlthau's empirical work in the 1980s and has been validated across diverse user groups for more than three decades.

James Pennebaker's Linguistic Inquiry and Word Count (LIWC) framework has cataloged the linguistic categories that map to psychological states since the early 1990s3. It maps words to categories like cognitive mechanisms, certainty, tentativeness, causation, and affective tone, and has been used in hundreds of peer-reviewed studies across psychology, communication, and computational linguistics.

The principle is not contested: linguistic form encodes psychological state, and this can be measured systematically.

But I do need to make a strong caveat here: the examples I've given above are just that: examples. They are meant to tie my theoretical explanation (no doubt fully flawed) to examples you know. It really depends on the context if these examples are applicable to a given situation.

That said: that's where reasoning models come in.

What's actually new is the technology, not the science #

I want to be careful, because this is where it would be easy to overstate things.

The traditional tools in this field are dictionary-based. LIWC counts words against fixed categories. Carefully validated, language-specific, decades of psychometric work behind them. These remain the standard in academic psycholinguistics, and they are excellent for what they do.

The problem with dictionary-based tools, for the specific use case of audience research on commercial SEO data, is twofold. First, the categories are designed for general psychological assessment, not for fine-grained buying-phase analysis. Second, adapting them to new languages or domains is significant work. LIWC has been ported to Dutch, but the validation is non-trivial.

What changed in the last eighteen months: reasoning LLMs became commercially available4. These are not the same thing as the tools psycholinguists use. They are a different technique that happens to be useful for a similar kind of problem. And hey, I'm an opportunistic SEO-nerd, so I see an opportunity!

The advantage of reasoning LLMs for this specific use case is practical, not theoretical. You define your framework in a prompt (cognitive phase, affective markers, behavioural intent) with explicit linguistic markers per category. The model reads each query against that framework and produces a structured classification with the reasoning that led to it. No training data required. No labeling investment. Framework adjustable per project or per market. Auditable per query.

This is not what traditional psycholinguists use, and I would not want to pretend otherwise. It builds on the same principle (form encodes state) but uses a different technique that fits a different operational reality: tens of thousands of queries per market, multiple markets in multiple languages, business cases that need answers in weeks rather than research cycles in years.

Recent research has begun validating this approach though! Reasoning-enhanced LLM classification matches or outperforms traditional methods on complex linguistic phenomena involving negation, conditionals, and contextual nuance, exactly the markers that matter for buying-phase analysis5.

The interesting position is in the middle: rigorous about the underlying principle (it's established science), modest about the technique (it's a new application, not a new theory).

How I use this in my tool Skåut #

With Skåut I analyse large volumes of People-Also-Ask queries per market that surface in search results. The earlier approach was to cluster those questions through embedding similarity and derive "audience segments" from the clusters.

That approach produced useful topic information, but no in-depth audience information. It said "your market asks a lot about X", not "your market feels Y about X".

The new approach classifies queries along several dimensions at once: where in the search process the question fits (Kuhlthau-style cognitive phase), what emotional charge it carries (affective markers), and what the asker is concretely trying to do (behavioural intent). I layer this on top of mechanical classification akin to Google's RQ-classes. The result is a much richer picture of what moves a market at a given moment.

A (hypothetical) example: in an analysis of a B2B SaaS market, more than half of the People-Also-Ask queries around a specific product category carried risk-aversion signals ("what if", "safety", "risks"), while the content of the two market leaders almost exclusively used hope-and-growth language. That's a direct strategic gap: the market is looking for reassurance; the competition is offering ambition. Whoever publishes risk-mitigation content first wins that conversation.

That's a conclusion you don't get from embedding clustering. You simply don't. A missed opportunity!

So, what to do with this in SEO? #

Three takeaways:

  1. 1.

    If your tooling or platform promises "audience insights", find out what's actually under the hood. Topic clustering gives you a topic landscape. That's useful, but it isn't audience research.

  2. 2.

    Understand that question form is a richer signal than question content for measuring what a market feels. The form sits in the data already; you only need to read it differently.

  3. 3.

    This has become an accessible domain. The science has existed for decades; the technology to apply it at scale has only recently arrived. Anyone making the combination sees markets that remain invisible to others.

A shoutout and some mentions & links to folks who've inspired this #

This article stands on conversations and work by others.

A particular thanks to Dorron Shapow, whose writing on what he calls Search User Optimisation and the layers underneath search queries was a real source of inspiration for how I think about the affective dimension of audience research. We come at this from different angles (me looking directly towards scalability, which is something we've discussed and maybe don't even agree upon), but his insistence that there is more to a question than its answer has shaped my own framing.

Thanks also to Arnout Hellemans for sparring on these ideas over time. He will let you folks know more soon.

And credit where it is due to Mark Williams-Cook, both for the work referenced in the footnotes and for his broader focus on People-Also-Ask as a serious data source. The tool he built (AlsoAsked) is part of why this kind of question-level analysis is feasible at all in commercial SEO contexts.

Let's talk about this #

I love talking about this. Let me know if you do too.

References #

  • 1 Williams-Cook, M. (2024). Findings discussed at SearchNorwich and reported across the SEO press. Williams-Cook and his team identified an endpoint vulnerability that returned roughly 2 terabytes of data covering over 90 million queries, revealing more than 2,000 properties Google uses to classify queries and websites, including the eight "Refined Query Semantic Classes" referenced here. Google paid the team a bounty of $13,337 for responsible disclosure. The eight classes summarised in this article are paraphrased from secondary reporting; the precise internal Google labels may differ. See also: AlsoAsked (alsoasked.com), and the SearchNorwich presentation "Improving your SEO with conceptual models". back
  • 2 Kuhlthau, C. C. (1991). Inside the Search Process: Information Seeking from the User's Perspective. Journal of the American Society for Information Science, 42(5), 361–371. The Information Search Process (ISP) model was consolidated in Seeking Meaning: A Process Approach to Library and Information Services (1993, second edition 2004). It explicitly incorporates the affective (feelings), cognitive (thoughts), and physical (actions) dimensions of each stage of information seeking. back
  • 3 Pennebaker, J. W., Booth, R. J., Boyd, R. L., & Francis, M. E. (2015). Linguistic Inquiry and Word Count: LIWC2015 Operator's Manual. Austin, TX: Pennebaker Conglomerates. LIWC began in the early 1990s and has been updated through 2001, 2007, 2015, and 2022 versions. The framework offers a closed-vocabulary dictionary mapping words to psychological categories. As of 2020, LIWC had been used in nearly 600 peer-reviewed papers indexed in Web of Science. It has been adapted and validated across multiple languages including Dutch. back
  • 4 OpenAI released its first reasoning-capable model, o1-preview, in September 2024; the full o1 followed in December 2024. Other labs followed quickly: DeepSeek R1, Anthropic's extended thinking, Google's Gemini reasoning modes, Mistral's Magistral series. Unlike embedding models, these models can be prompted to evaluate the form of a query against predefined linguistic markers and produce structured, traceable classifications. The relevant shift is not just capability but cost: classifying tens of thousands of queries per market is now feasible for tens of dollars rather than thousands. back
  • 5 See for instance Sun et al. (2023), "Text Classification via Large Language Models" (CARP framework), and recent benchmarks like TextReasoningBench (2025). The general finding: reasoning-enhanced LLM classification handles complex linguistic phenomena (negation, conditionals, intensification, irony) better than zero-shot or fine-tuned baseline approaches, at higher computational cost. For audience research on commercial scale data, the cost trade-off is favourable; for academic psycholinguistics work, dictionary-based methods like LIWC remain the standard. back
Menu
Published: May 13, 2026 ~ 11 min.
Home  »  Blog  »  Use reasoning models for audience research

Eikhart - Mad Scientist