AI as System /#1 - Smarter Search with RQ-RAG

AI as System /#1 - Smarter Search with RQ-RAG

Apr 8, 2025¡
Mohsen Davarynejad
Mohsen Davarynejad
¡ 3 min read
Image credit: Unsplash

About AI as System

AI as System is a series about understanding artificial intelligence not as isolated models, but as components in larger systems.
Each post explores how AI behaves when embedded in real-world workflows—retrieving information, making decisions, adapting to context, or supporting human users.

Whether you’re working in logistics, education, product development, or research, this series helps you think in systems.
By following real examples across domains and AI subfields, we explore how intelligence emerges through structure, interaction, and feedback—not just through parameters.


1. The Problem (Why)

Why do some LLMs give half-right or vague answers even when they have access to search tools?

Most Retrieval-Augmented Generation (RAG) systems take the user’s query and pass it as-is to the retriever.
If the query is ambiguous, multi-hop, or under-specified, the retriever brings back noisy or irrelevant documents.
This ruins the quality of the final answer—even with powerful LLMs.


2. The Core Idea (What)

Don’t just retrieve. Refine the question first.

RQ-RAG (Refined Query RAG) teaches a model to:

  • Rewrite ambiguous or complex queries
  • Decompose multi-hop questions into simpler ones
  • Disambiguate vague phrases

This improves the retriever’s performance and leads to more grounded, accurate responses.


3. A Running Example (How)

Original Query:
Who was the US president when the Berlin Wall fell, and what was his stance on the event?

Refined Sub-Queries:

  1. When did the Berlin Wall fall?
  2. Who was the US president in November 1989?
  3. What was George H. W. Bush’s stance on the fall of the Berlin Wall?

Each sub-question is focused and retrievable.
The model retrieves documents for each one, then generates a high-quality answer by composing them.


4. Under the Hood

Training the Refiner

The authors fine-tuned a language model (like LLaMA2-7B) on QA datasets such as HotpotQA and MuSiQue.

Each training sample looks like:

{
  "input": "Where did the scientist who developed the polio vaccine study?",
  "output": [
    "Who developed the polio vaccine?",
    "Where did Jonas Salk study?"
  ]
}

RQ-RAG Pipeline

  1. User Query → passed to a Query Refiner
  2. Refined Queries → used to retrieve documents
  3. Retrieved Docs → passed to a Generator LLM
  4. Final Answer → grounded in facts, clear and complete

5. Try It Yourself

You can test this workflow with OpenAI or LLaMA models.

Prompt Template:

Decompose the following question into simpler, factual sub-questions that can each be answered directly.

Question: Who was the US president when the Berlin Wall fell, and what was his stance on the event?

Decomposed Questions:
1. When did the Berlin Wall fall?
2. Who was the US president in November 1989?
3. What was George H. W. Bush's stance on the fall of the Berlin Wall?

Use the sub-questions to retrieve more precise documents.
Then synthesize the final answer using a language model.


6. Why It Matters

  • Makes your retrieval smarter, not just louder
  • Boosts accuracy in multi-hop QA
  • Works well even with smaller LLMs
  • Helps build systems that understand before they search

Next in AI as System:
How agentic systems use memory, feedback, and tools to complete multi-step tasks in uncertain domains.

Read the full paper here (arXiv:2402.07233)